DIC.QLE  CORY  ADA110955 


Dimensionality  of  Measured 
Achievement  Over  Time 


Kathleen  A.  Gialluca 
and 

David  J.  Weiss 


Research  Report  81-5 
December  1981 

Computerized  Adaptive  Testing  Laboratory 
Psychometric  Methods  Program 
Department  of  Psychology 
University  of  Minnesota 
Minneapolis,,  HN  55955 

This  research  was  supported  by  funds  from  the  Air 
Force  Human  Resources  Laboratory,  the  Army  Research 
Institute,  the  Air  Force  Office  of  Scientific 
Research,  and  the  Office  of  Naval  Research, 
and  monitored  by  the  Office  of  Naval  Research. 

Approved  for  public  release;  distribution  unlimited. 
Reproduction  in  whole  or  in  part  is  permitted  for 
any  purpose  of  the  United  States  Government. 


;fic 

%‘-LllCT£ 

F£B  1  6  1982 

A 


D. 


82  Q'J  16  004 


Unclassified 


SECURITY  CLASSIFICATION  OF  THIS  PAGE  (Whtn  Dmtm  Cnlttmd) 


REPORT  DOCUMENTATION  PAGE 

READ  INSTRUCTIONS 

BEFORE  COMPLETING  FORM 

1  REPORT  NUMBER 

Research  Report  81-5 

2.  GOVT  ACCESSION  NO. 

/-V  A  1 

3.  RECIPIENT'S  CATALOG  NUMBER 

r  1 ) 

4  TITLE  (end  Subtitle)  ’ 

Dimensionality  of  Measured  Achievement 

Over  Time 

s.  type  of"  report  «  period  covered 

Technical  Report 

e.  PERFORMING  ORG.  REPORT  NUMBER 

7.  AUTHOR^; 

Kathleen  A.  Gialluca  and  David  J.  Weiss 

•  .  CONTRACT  OR  GRANT  NUMBER^; 

N00014-79-C-0172 

9.  PERFORMING  ORGANIZATION  NAME  ANO  ADDRESS 

Department  of  Psychology 

University  of  Minnesota 

Minneapolis,  Minnesota  55455 

10.  PROGRAM  ELEMENT.  PROJECT.  TASK 
„  AREA  A  WORK  UNIT  NUMBERS 

P.E.:  6115N  Proj.:  RR042-04 
T.A.:  RR042-04-01 

W.U.:  NR  150-433 

1  1.  CONTROLLING  OFFICE  NAME  ANO  AOORESS 

Personnel  and  Training  Research  Programs 

Office  of  Naval  Research 

Arlington,  Virginia  22217 

12.  REPORT  DATE 

December,  1981 

11.  NUMBER  OF  PAGES 

34 

14  MONITORING  AGENCY  NAME  6  ADORESSflf  dlllmrmnl  from  Confrof/fnf  Otltcm) 

IS.  SECURITY  CLASS,  (ot  tht,  nport) 

IS*.  DECLASSIFICATION/ DOWN  GRADING 
SCHEDULE 

16  DISTRIBUTION  STATEMENT  (at  Ih It  Report; 

Approved  for  public  release;  distribution  unlimited.  Reproduction  in 
whole  or  in  part  is  permitted  for  any  purpose  of  the  United  States  Govern¬ 
ment  . 

17.  DISTRIBUTION  STATEMENT  (of  the  ebatrect  entered  In  Block  30,  li  different  from  Report) 

IS.  SUPPLEMENTARY  NOTES 

This  research  was  supported  by  funds  from  the  Air  Force  Human  Resources 
Laboratory,  the  Army  Research  Institute,  the  Air  Force  Office  of  Scientific 
Research,  and  the  Office  of  Naval  Research,  and  monitored  by  the  Office 
of  Naval  Research. 

19.  KEY  WOROS  (Continue  on  reveree  aide  It  neceeeery  and  Identity  by  block  number) 

achievement  testing  factor  comparisons 

change  scores  factor  structure 

measurement  of  change 
measurement  of  growth 

[  .  .  ...  .  . 

30.  ABSTRACT  (Continue  on  r«v«r««  aide  If  n«c«a««ry  and  Identify  by  block  number) 

“Some  type  of  difference  or  change  score  is  frequently  used  to  quantify  the 
effects  of  experimental  treatments  and  educational  programs  on  individuals  and 
on  groups  of  individuals.  Whether  the  change  measurement  involves  the  use  of 
simple  difference  scores,  their  derivatives,  or  some  more  complex  methodologi¬ 
cal  design,  the  measurement  process  itself  assumes  that  the  treatment  or  in¬ 
struction  results  in  higher  levels  of  the  originally  measured  variable  and 
that  the  only  change  that  occurs  is  a  quantitative  one.  If  this  assumption  is__ 

00 


FORM 
1  JAN  73 


1473 


EDITION  OF  <  NOV  SS  IS  OBSOLETE 

S/N  01 02-L F-01 4-6601 


Unclassified 


SECURITY  CLASSIFICATION  OF  THIS  PAGE  r**i»n  Del*  E»i«r«n 


_ Unclassified _ 

KCUWTY  CLASSIFICATION  Of  THIS  FAOt  (Whmm  Cm  Mi mw« _ 

not  met,  then  the  computation  of  any  type  of  difference  score  is  inappropriate 
and  the  scores  themselves  are  useless  for  measuring  growth  or  change. 

Two  studies  investigated  the  tenability  of  the  assumption  that  classroom  in¬ 
struction  results  in  increases  in  students'  achievement  levels  while  the  qual¬ 
itative  nature  of  that  achievement  remains  constant  across  time.  The  data 
utilized  were  the  item  responses  to  tests  in  basic  mathematics  and  in  general 
biology  administered  as  pretests  and  after  instruction  to  students  enrolled  in 
those  courses. 

Results  indicated  that  this  assumption  was  not  tenaole  in  the  biology  data 
set,  where  increases  in  mean  achievement  level  were  accompanied  by  correspond¬ 
ing  changes  in  the  factor  structure  underlying  the  item  responses.  For  the 
mathematics  data,  however,  there  was  no  such  violation  of  the  assumption:  As 
student  achievement  levels  increased  the  underlying  factor  structure  remained 
unchanged.  The  implications  of  these  results  for  psychology,  education,  and 
program  evaluation  are  noted. 


Unclassified 


SECURITY  CLASSIFICATION  OF  THIS  FAOC(W*iw>  Dm* 


Contents 


Introduction  .  1 

Purpose  . . . .  • . .  2 

Study  I  . 3 

Me  thod  .  3 

Subjects  and  Tests  . 3 

Analyses  . . . . . . .  3 

Differences  in  Achievement  Level  Estimates  .  3 

Differences  in  the  Structure  of  Achievement  . . .  3 

Results  .  4 

Differences  in  Achievement  Level  Estimates  .  4 

Total  Score  Differences  .  4 

Item  Difficulties  . . 4 

Correlation  Between  Scores  .  6 

Differences  in  the  Structure  of  Achievement  .  6 

Internal  Consistency  Reliability  . . .  6 

Number  of  Factors  Extracted  .  6 

Factor  Similarity  . 6 

Conclusions  . 9 

Differences  in  Achievement  Level  Estimates  .  9 

Differences  in  the  Structure  of  Achievement  .  9 


Study  II  .  10 

Method  . 10 

Subjects  . 10 

Design  .  10 

Tests  .  10 

Experimental  Groups  .  10 

Analyses  . 11 

Differences  in  Achievement  Level  Estimates:  Test  A  .  11 

Differences  in  the  Structure  of  Achievement:  Test  A .  12 

Differences  in  Achievement  Level  Estimates:  Test  B  .  12 

Differences  in  the  Structure  of  Achievement:  Test  B  .  12 

Results  . 13 

Effects  of  Item  Repetition  .  13 

Missing  Data  . 14 

Differences  in  Achievement  Level  Estimates:  Test  A  .  14 

Total  Score  Differences  .  14 

Item  Difficulties  .  14 

Differences  in  the  Structure  of  Achievement:  Test  A  .  14 

Internal  Consistency  Reliability  .  14 

Numbers  of  Factors  Extracted  .  17 

Factor  Similarity  . 17 

Differences  in  Achievement  Level  Estimates:  Test  B  .  17 

Total  Score  Differences  . 17 

Item  Difficulties  . .  19 

Differences  in  the  Structure  of  Achievement:  Test  B  . . .  20 

Internal  Consistency  Reliability  .  20 

Number  of  Factors  Extracted  .  20 

Factor  Similarity  . .  20 

Conclusions  .  24 

Differences  in  Achievement  Level  Estimates  . 24 

Differences  in  the  Structure  of  Achievement  .  24 


Discussion  and  Conclusions 


24 


References  .  26 

Appendix:  Supplementary  Tables  . . . . .  28 


Acknowledgments 

Data  utilized  in  Study  1  of  this  report  were  obtained  from  students  enrolled 
in  General  College  mathematics  courses  at  the  University  of  Minnesota  during 
fall  quarter  1979.  Appreciation  is  extended  to  these  students  and  to 
Douglas  Robertson,  Mathematics  Coordinator  of  General  College,  for  their 
participation  in  this  research. 

Data  utilized  in  Study  II  of  this  report  were  obtained  from  volunteer 
students  in  General  Biology,  Biology  1-011,  at  the  University  of  Minnesota 
during  winter  quarter  1980.  Appreciation  is  extended  to  these  students,  and 
to  Kathy  Swart  and  Norman  Kerr  of  the  General  Biology  staff,  for  their 
participation  in  this  research.  Gage  Kingsbury  and  Elana  Broch  were 
responsible  for  the  research  design  and  collection  of  biology  data  during 
Winter  1980,  of  which  the  data  reported  herein  were  a  part. 


Technical  Editor:  Barbara  Leslie  Camm 


Dimensionality  of  Measured  Achievement  Over  Time 


The  measurement  of  individual  or  group  change  is  central  to  many  issues  in 
the  fields  of  psychology,  education,  and  program  evaluation.  Psychologists, 
educators,  and  (more  recently)  evaluators  typically  use  differences  in  test 
scores  to  quantify  the  effects  of  experimental  treatments  and  educational  pro¬ 
grams  on  individuals  and  on  groups  of  individuals. 

The  typical  paradigm  for  measuring  change  involves  the  administration  of  a 
standardized  achievement  test  both  before  and  after  an  experimental  treatment  or 
program  implementation;  the  effect  of  the  treatment  intervention  is  then  consid¬ 
ered  to  be  a  function  of  the  mean  difference  between  the  two  sets  of  test 
scores.  If  two  or  more  groups  of  students  are  involved,  comparisons  can  also  be 
made  between  treatment  and  control  groups,  or  among  groups  exposed  to  various 
treatments  or  involved  in  several  different  programs.  Again,  evaluation  of 
treatment  effects  involves  comparing  the  mean  achievement  gain  (typically,  a 
function  of  the  difference  scores)  observed  for  each  group.  Individual  gain  or 
change  is  also  frequently  used  to  measure  an  individual's  growth  in  achievement 
level  or  change  due  to  a  treatment  or  special  program. 

Lord  (1963)  and  Cronbach  and  Furby  (1970),  among  others,  have  discussed  the 
.aethodological  and  statistical  problems  involved  in  using  difference  scores  to 
measure  change  or  growth  and  have  presented  some  possible  solutions.  Whether 
measurements  of  change  involve  the  use  of  simple  difference  scores,  their  deriv¬ 
atives,  or  some  more  complex  methodological  design,  the  measurement  process  it¬ 
self  assumes  that  the  treatment  or  instruction  results  in  increased  levels  of 
the  same  trait  or  characteristic  that  was  measured  originally  and  that  the  only 
change  that  occurs  is  a  quantitative  one. 

That  this  assumption  may  be  violated  has  long  been  evident  in  studies  of 
intelligence  and  intellectual  growth.  Garrett  (1946)  noted  that  "intelligence 
changes  in  its  organization"  (p.  373)  and  called  for  corresponding  changes  in 
the  way  intelligence  is  measured.  This  "differentiation  hypothesis"  spawned 
much  research  (see  Reinert,  1970,  for  a  review)  concerning  the  changes  in  the 
structure  and  organization  of  intelligence  throughout  the  human  life  span.  Some 
of  these  studies  report  results  supporting  the  hypothesis  of  age  differentia¬ 
tion;  others  offer  support  for  a  hypothesis  of  age  integration,  and  still  others 
provide  evidence  in  support  of  both  these  hypotheses.  Nearly  all  this  research, 
however,  has  found  that  the  structure  of  intelligence,  as  defined  by  factor 
analysis,  does  not  remain  constant  with  age  and  experience. 

Other  authors  (Anastasi,  1936;  Ferguson,  1954;  Games,  1962;  Woodrow,  1938, 
1939a,  1939b,  1939c)  have  investigated  the  changes  in  verbal  ability  and  intel¬ 
lectual  factor  structure  that  accompany  shorter  term  training  and  practice. 
Similar  factor-analytic  investigations  have  been  made  in  the  areas  of  psychomo¬ 
tor  behavior  (Fleishman,  1953,  1957,  1960;  Fleishman  &  Hcrapel,  1954,  1955; 
Greene,  1943),  psycholinguistic  abilities  (Querishi,  1967),  word  association 
(Sullivan  &  Moran,  1967;  Swartz  &  Moran,  1968),  and  even  the  learning  of  Morse 
code  (Fleishman  &  Fruchter,  1960).  All  these  authors  have  found  that  the  facto- 


-  2  - 


rial  structure  of  abilities  underlying  task  performance  changes  in  a  systematic 
way  with  training  and  practice.  An  individual's  status  at  a  later  point  in 
time,  then,  may  be  qualitatively  different  from  his/her  status  as  originally 
measured . 

Wohlwill  (1970)  discusses  this  issue  of  quantitative  versus  qualitative 
change  more  generally  in  the  area  of  developmental  psychology  and,  like  Garrett 
(1946),  calls  for  more  sophisticated  scaling  methods  which  will 

...  allow  us  to  assess  an  individual's  status  on  a  developmental  dimen¬ 
sion  in  a  manner  such  as  to  ensure  not  only  comparability  of  content 
for  the  different  parts  of  that  dimension,  but  at  the  same  time  a  con¬ 
tinuous  scale  along  which  developmental  change  can  be  charted  .... 
Postulating  a  unitary  dimension  across  the  age  span  under  investigation 
presupposes  that  there  are  no  major  discontinuities  in  the  development 
of  the  behavior  in  question,  such  as  there  obviously  are  in  the  assess¬ 
ment  of  intelligence  when  we  move  from  infancy  to  childhood.  (  p.  154) 

Although  Reinert  ( 1970)  called  for  the  investigation  of  possible  factor- 
structure  changes  in  areas  other  than  intelligence  and  abilities  more  than  a 
decade  ago,  no  research  has  yet  extended  this  line  of  questioning  into  the  area 
of  classroom  achievement.  That  is,  there  have  been  no  reported  studies  that 
have  systematically  investigated  whether  the  individual  and  group  changes  that 
occur  after  classroom  instruction  or  program  participation  are  quantitative 
changes  in  the  level  of  achievement,  as  is  generally  assumed,  or  whether  more 
qualitative  changes  in  the  structure  of  the  achievement  variable  have  occurred. 

Kingsbury  and  Weiss  (1979)  studied  the  effects  of  testing  students  at  dif¬ 
ferent  points  in  instruction.  They  reported  that  the  single  factor  extracted 
from  the  item  responses  to  a  college  general  biology  examination  administered  on 
the  first  day  of  class  and  the  factor  extracted  from  the  item  responses  to  a 
classroom  midquarter  examination  differed  markedly  from  each  other  in  terms  of 
strength;  however,  they  could  not  further  investigate  the  similarity  of  the  fac¬ 
tor  pattern  loadings  from  both  administrations.  They  cautioned  that  replica¬ 
tions  of  their  findings  contrasting  the  pretest  factor  with  the  later  achieve¬ 
ment  factor  would  render  difference  scores  "completely  useless"  as  indicators  of 
achievement  level  growth,  since  different  variables  would,  in  fact,  be  measured 
at  the  two  points  in  time. 

The  importance  of  such  a  conclusion  should  not  be  underestimated.  If  dif¬ 
ferent  characteristics  are,  in  fact,  being  measured  at  two  different  occasions, 
then  the  computation  of  any  type  of  difference  score  is  inappropriate,  and  the 
evaluation  of  program  effectiveness  and  gains  in  individual  student  achievement 
must  be  made  on  some  other  basis.  It  is  justifiable  to  use  difference  scores 
(statistical  and  methodological  issues  notwithstanding)  only  when  it  can  be  dem¬ 
onstrated  that  quantitative  changes  are  the  only  changes  accompanying  instruc¬ 
tion. 

Purpose 


The  objectives  of  the  present  studies  were  to  investigate  the  nature  of  the 
changes  in  the  dimensionality  of  achievement  that  occurred  following  instruction 
in  two  different  achievement  domains — basic  mathematics  and  general  biology — and 


J 


-  3  - 


to  determine  the  appropriateness  of  calculating  difference  scores  in  order  to 
measure  change  in  these  domains. 


STUDY  I 
Method 


Subjects  and  Tests 

Data  were  obtained  from  students  enrolled  in  mathematics  classes  at  the 
University  of  Minnesota's  General  College  during  the  fall  quarter  of  1979. 

These  students  were  administered  a  35-item  Arithmetic  Placement  Test  (APT)  on 
the  first  day  of  class  (pretest)  and  again  as  a  final  examination  (posttest). 

The  APT  is  composed  of  five-alternative  multiple-choice  items  covering  such  top¬ 
ics  as  addition,  subtraction,  multiplication,  and  division  of  whole  numbers, 
fractions,  decimals,  and  percents. 

Item  responses  were  coded  as  correct,  incorrect,  or  missing  for  the  259 
students.  However,  only  136  of  the  students  answered  every  item  on  the  APT  on 
both  occasions,  i.e.,  123  students  omitted  or  did  not  reach  at  least  one  item  on 
either  occasion.  In  many  cases,  clusters  of  items  were  omitted  in  the  middle  of 
the  tests,  which  implied  that  students  were  omitting  the  groups  of  items  for 
which  they  did  not  know  the  answers,  rather  than  reaching  a  time  limit  for  the 
test.  To  deal  with  this  problem  of  missing  data,  a  15%-missing-data  criterion 
was  employed.  A  student's  response  protocol  was  deleted  from  the  data  set  if 
the  student  omitted  more  than  five  items  (i.e.,  15%  of  35  items)  on  either  the 
pretest  or  the  posttest.  This  resulted  in  a  group  of  220  students  on  which  all 
further  analyses  were  based.  For  these  220  students,  missing  data  were  coded  as 
incorrect  on  the  assumption  that  the  student  did  not  answer  the  item  because 
he/she  did  not  know  the  answer  and  was  unwilling  to  guess. 

Analyses 

Differences  in  achievement  level  estimates.  The  question  of  interest  with 
respect  to  achievement  level  estimates  was  whether  there  were  differences  in 
achievement  level  estimates  due  to  instruction,  i.e.,  were  students  growing  or 
gaining  in  achievement  levels  throughout  the  course  of  instruction?  Analyses 
pertinent  to  this  question  included  comparisons  or  the  frequency  distributions 
of  number-correct  scores  both  before  and  after  instruction  and  a  t_  test  for  the 
difference  between  the  means  of  scores  on  the  pretest  and  the  posttest.  Compar¬ 
isons  were  also  made  of  the  distributions  of  item  difficulties  for  each  adminis¬ 
tration  of  the  APT.  The  correlation  between  scores  on  the  pretest  and  posttest 
was  computed  as  an  indication  of  the  degree  to  which  the  scores  were  linearly 
related . 

Differences  in  the  structure  of  achievement.  A  related  but  less  often  in¬ 
vestigated  issue  is  whether  there  are  differences  in  the  structure  of  item  re¬ 
sponses  due  to  instruction.  Investigation  of  this  issue  involved  computing  and 
comparing  the  values  of  coefficient  alpha  as  an  index  of  internal  consistency, 
which  is  related  to  the  average  level  of  intercorrelation  of  the  items.  More 
germane  to  this  issue,  however,  was  whether  the  factor  structure  underlying  the 
test  changed  with  instruction  or  whether  it  remained  constant.  Consequently, 


principal  axes  factor  analyses  were  performed  separately  on  the  pretest  and 
posttest  item  responses.  Pearson  product-moment  correlations  were  computed  be¬ 
tween  pairs  of  item  responses,  and  the  diagonal  elements  of  the  interitem  corre¬ 
lation  matrices  were  replaced  with  initial  estimates  of  the  communalities  of 
each  item,  as  given  by  the  squared  multiple  correlation  between  that  item  and 
the  other  items  in  the  matrix.  An  iterative  procedure  for  improving  these  com- 
raunality  estimates  was  used,  successively  extracting  factors  and  re-estimating 
the  communalities.  This  process  continued  until  the  difference  between  two  suc¬ 
cessive  communality  estimates  was  negligible  (see  Nie,  Hull,  Jenkins,  Steln- 
brenner,  &  Bent,  1975). 

Random  sets  of  item  responses  were  generated  by  simulating  the  responses  of 
220  students  to  35  items  such  that  the  probability  of  a  correct  answer  by  any 
simulee  to  an  item  was  equal  to  the  difficulty  (proportion  correct)  of  that 
item.  This  was  done  separately  for  the  pretest  and  the  posttest.  Identical 
procedures  as  performed  for  the  real  data  were  carried  out  for  intercorrelating 
the  item  responses  and  factoring  the  resulting  matrix.  The  results  of  the  fac¬ 
tor  analyses  of  real  and  random  data  were  compared  to  determine  the  number  of 
"nonrandom"  factors  existing  in  the  real  data. 

The  final  factor  solutions  for  the  pretest  and  the  posttest  were  then  com¬ 
pared  in  terms  of  numbers  of  factors  extracted  and  the  similarities  between 
them.  Factor  similarity  was  evaluated  by  computing  the  root-mean-square  devia¬ 
tion,  the  product-moment  correlation  coefficient,  and  the  coefficient  of  congru¬ 
ence  between  the  factor  loadings  of  the  factors  extracted  at  each  test  adminis¬ 
tration  (see  Harman,  1976,  pp.  343-344).  These  similarity  measures  were  com¬ 
pared  with  values  obtained  from  the  two  sets  of  random  data,  as  recommended  by 
Nesselroade  and  Baltes  (1970). 

Results 

Differences  in  Achievement  Level  Estimates 

Total  score  differences.  Frequency  distributions  of  number-correct  scores 
for  both  administrations  of  the  APT  are  presented  in  Appendix  Table  A;  the  fre¬ 
quency  polygons  are  displayed  in  Figure  1.  This  figure  shows  that  although  the 
distribution  of  pretest  scores  was  approximately  symmetric,  the  distribution  of 
posttest  scores  was  negatively  skewed,  indicating  the  presence  of  a  ceiling  ef¬ 
fect.  Only  four  students  answered  all  35  items  correctly  on  the  posttest;  an 
additional  77  students  (or  35%)  incorrectly  answered  less  than  four  items.  The 
mean  score  on  the  pretest  was  22.26,  the  median  was  22.74,  and  the  standard  de¬ 
viation  was  5.97.  For  the  posttest  these  statistics  were  28.91,  30.10,  and 
4.88,  respectively.  A  one-tailed  t  test  for  the  difference  between  means  of 
dependent  groups  was  calculated  to  be  18.67,  with  probability  j>  <  .0001. 

Item  difficulties.  The  differences  in  raw  score  distributions  observed 
between  pretest  and  posttest  were  mirrored  in  the  distributions  of  item  diffi¬ 
culties  for  the  two  administrations  of  the  APT,  as  shown  in  Table  1.  Although 
the  pretest  items  were,  on  the  average,  answered  correctly  more  often  than  not, 
nearly  a  third  of  them  (i.e.,  10  of  35)  were  answered  incorrectly  by  at  least 
half  of  the  students.  For  the  posttest,  however,  only  two  of  the  items  were  as 
difficult.  In  fact,  one  third  of  the  items  (12  of  35)  were  answered  correctly 
by  more  than  90%  of  the  students. 


5 


Figure  1 

Grouped  Frequency  Distribution  of  Number-Correct  Sco 
for  APT  Pretest  and  Posttest 


Number-Correct  Score 


Table  1 


Frequency  Distributions  of 
Item  Difficulties  for  APT 
Administered  as  Pretest  and  as  Posttest 


Range  of  Item 
Difficulty 

Number 

Pretest 

of  Items 
Posttest 

.00 

- 

.10 

0 

0 

.11 

- 

.20 

1 

0 

.21 

- 

.30 

1 

0 

.31 

- 

.40 

4 

0 

.41 

- 

.50 

4 

2 

.51 

- 

.60 

5 

0 

.61 

- 

.70 

5 

3 

.71 

- 

.80 

6 

9 

.81 

- 

.90 

5 

9 

.91 

-  1 

.00 

4 

12 

Mean 

Difficulty 

.64 

.83 

-  6  - 


Correlation  between  scores.  The  Pearson  product-moment  correlation  coeffi¬ 
cient  between  number-correct  scores  at  the  two  administrations  of  the  APT  was 
.542.  This  relatively  low  value,  coupled  with  the  evidence  of  mean  score  in¬ 
creases,  reveals  that  students  did  not,  to  a  great  extent,  maintain  their  rela¬ 
tive  standings  in  the  course  after  instruction. 

Differences  in  the  Structure  of  Achievement 

Internal  consistency  reliability.  The  internal  consistency  reliability  of 
the  APT,  as  indexed  by  coefficient  alpha,  was  .836  for  the  pretest  and  .835  for 
the  posttest.  That  the  reliability  coefficient  remained  essentially  constant 
provides  some  evidence  for  concluding  that  the  items  were  functioning  together 
in  the  same  manner  before  and  after  instruction.  However,  since  the  variance  of 
the  scores  decreased  somewhat  from  pretest  to  posttest  (see  Appendix  Table  A), 
the  stability  of  coefficient  alpha  may  actually  reflect  a  slight  increase  in  the 
average  interitem  correlation. 

Number  of  factors  extracted.  The  eigenvalues  and  percent  of  total  variance 
accounted  for  by  the  first  15  factors  from  the  APT  and  random  data  are  given  in 
Appendix  Table  B.  The  plots  of  eigenvalues  versus  factors  extracted  for  both 
the  APT  and  the  random  data  are  given  in  Figure  2a  for  the  pretest  and  in  Figure 
2b  for  the  posttest.  In  both  cases,  there  was  one  relatively  strong  factor  in 
the  data;  the  eigenvalue  for  the  first  factor  extracted  from  the  APT  was  much 
larger  than  the  eigenvalues  for  the  remaining  factors  in  the  APT  and  for  all  the 
factors  in  the  random  data.  The  same  cannot  be  said  for  any  of  the  remaining 
factors.  It  was  concluded  that  a  one-factor  solution  adequately  described  the 
item  response  data  from  both  the  pretest  and  the  posttest.  The  FACTOR  subrou¬ 
tine  in  SPSS  (Nie  et  al . ,  1975)  was  then  run  again  on  the  data  from  each  admin¬ 
istration,  specifying  a  single-factor  solution  each  time. 

Factor  similarity.  The  factor  loadings  on  the  single  factor  extracted  from 
each  administration  of  the  APT  and  from  corresponding  random  data  are  given  in 
Table  2.  The  loadings  presented  in  Table  2  were  of  moderate  magnitude;  the  ma¬ 
jority  of  the  loadings  were  greater  than  .300,  but  all  were  less  than  .700.  The 
patterns  and  the  magnitudes  of  the  loadings  were  essentially  the  same  across 
test  administrations.  For  example,  Items  2  through  5  and  Item  28  were  among  the 
items  with  the  lowest  loadings  at  the  pretest;  the  same  was  true  for  these  items 
at  the  posttest.  The  items  with  the  highest  loadings  at  the  pretest  were  also 
among  the  items  with  the  highest  loadings  at  the  posttest.  That  the  magnitude 
of  the  loadings  was  similar  for  the  two  administrations  can  also  be  seen  by  com¬ 
paring  the  percentage  of  total  variance  accounted  for  by  each  factor.  The  sin¬ 
gle  factor  extracted  from  the  APT  pretest  data  accounted  for  13.92%  of  the  total 
variance  compared  to  3.05%  for  the  random  data.  The  factor  extracted  from  the 
APT  posttest  data  was  only  slightly  stronger,  accounting  for  14.59%  of  the  total 
variance  as  compared  to  2.40%  in  the  random  data. 

Table  3  presents  the  measures  of  factor  similarity  between  the  APT  factor 
loadings  at  pretest  and  at  posttest.  The  root-mean-square  deviation  between  the 
loadings  extracted  at  each  administration  is  sensitive  to  differences  in  the 
absolute  levels  of  the  loadings;  low  values  indicate  only  minor  differences  be¬ 
tween  the  values  of  the  two  sets  of  loadings.  The  root-mean-square  deviation 
was  a  low  .089  for  these  data.  The  product-moment  correlation  coefficient  is 


-  8  - 


Table  2 


Factor  Loadings  on  the  Single  Factor 
Extracted  from  APT  at  Pretest  and  at  Posttest, 
and  from  Corresponding  Random  Data 


Pretest 

Posttest 

Item 

APT 

Random  Data 

APT 

Random  Data 

1 

.289 

.124 

.303 

-.042 

2 

.088 

.027 

-.004 

.130 

3 

.058 

.315 

.152 

-.049 

4 

.160 

.010 

.219 

-.051 

5 

.191 

.230 

.226 

.140 

6 

.263 

-.187 

.255 

.172 

7 

.332 

-.188 

.118 

.032 

8 

.315 

.147 

.383 

.036 

9 

.156 

.099 

.341 

.051 

10 

.384 

.150 

.495 

-.017 

11 

.453 

-.229 

.253 

-.277 

12 

.372 

-.178 

.244 

-.170 

13 

.255 

.007 

.259 

-.066 

14 

.394 

.345 

.338 

.136 

15 

.376 

.215 

.440 

.222 

16 

.575 

-.089 

.545 

.023 

17 

.426 

.075 

.436 

-.046 

18 

.562 

-.285 

.484 

.071 

19 

.491 

-.136 

.440 

.330 

20 

.588 

.109 

.506 

.135 

21 

.580 

.029 

.676 

.025 

22 

.460 

.185 

.418 

.212 

23 

.344 

-.200 

.378 

.319 

24 

.370 

.402 

.433 

.084 

25 

.338 

-.028 

.500 

.051 

26 

.460 

.108 

.560 

.005 

27 

.357 

-.074 

.467 

-.015 

28 

.117 

.044 

.141 

.054 

29 

.495 

.042 

.481 

.044 

30 

.291 

.162 

.294 

.196 

31 

.292 

-.276 

.352 

.006 

32 

.378 

.018 

.386 

.017 

33 

.318 

.084 

.281 

.195 

34 

.313 

.090 

.359 

.128 

35 

Percent  of 

.339 

.153 

.267 

-.442 

Total  Variance 

13.92 

3.05 

14.59 

2.40 

sensitive  only  to  differences  in  the  patterns  of  the  loadings  and  was  equal  to 
.793.  The  coefficient  of  congruence  is  sensitive  to  differences  in  both  the 
level  and  the  pattern  of  loadings  and  was  a  high  .972.  High  values  for  these 
latter  two  indices  indicate  a  high  degree  of  similarity  between  the  two  sets  of 
factor  loadings.  The  three  figures  computed  from  the  parallel  random  data  were 


.219,  .067,  and  .118,  respectively.  It  was  concluded  that  the  factors  extracted 
from  each  administration  of  the  APT  were  nearly  identical,  both  in  nature  and  in 
strength. 

Table  3 

Measures  of  Factor  Similarity  Between 
Factor  Loadings  of  APT  at  Pretest 
and  at  Posttest  and  Between  Factor  Loadings 


for  Corresponding 

Random 

Data 

Similarity  Index 

APT 

Random  Data 

Root-Mean-Square- 

Deviation 

.089 

.219 

Pearson  Product-Moment 

Correlation 

.793 

.067 

Coefficient  of 

Congruence 

.972 

.118 

Conclusions 

Differences  in  Achievement  Level  Estimates 

There  was  evidence  in  these  data  to  conclude  that  there  were  gains  in  mean 
achievement  levels  observed  after  a  course  of  instruction.  The  difference  be¬ 
tween  the  means  of  scores  on  the  35-item  pretest  and  posttest  was  nearly  7 
items;  the  frequency  distribution  of  number-correct  scores  changed  from  a  sym¬ 
metric  distribution  to  one  that  was  negatively  skewed  and  displaced  to  the 
right.  This  same  effect  was  mirrored  in  the  distributions  of  item  difficulties. 
The  correlation  between  the  two  s.ets  of  number-correct  scores  was  .542,  indicat¬ 
ing  that  students  did  not  generally  maintain  their  relative  standings  in  the 
course  after  instruction.  It  is  not  known  to  what  extent  this  correlation  was 
attenuated  due  to  the  ceiling  effect  observed  for  the  posttest  scores. 

Differences  in  the  Structure  of  Achievement 

Although  there  was  definitive  evidence  of  mean  quantitative  change  from 
pretest  to  posttest,  there  was  no  evidence  of  qualitative  differences  in  the 
factor  structure  underlying  the  item  responses.  The  internal  consistency  reli¬ 
ability  of  the  test  remained  constant  across  administrations.  When  factor  anal¬ 
yses  were  performed  separately  on  the  pretest  and  posttest  interitem  correlation 
matrices,  essentially  the  same  factor  was  extracted  each  time,  as  evidenced  by 
the  similarity  in  the  levels  and  pattern  of  factor  loadings. 

These  data  indicate,  then,  that  students  in  the  General  College  arithmetic 
classes  were  indeed  leaving  the  course  with  increased  levels  of  the  same  vari¬ 
able  measured  prior  to  instruction.  The  change  that  occurred  within  the  quarter 
was  quantitative,  not  qualitative. 


10  - 


STUDY  II 
Method 


Subjects 


Data  were  collected  from  students  enrolled  in  a  general  biology  class  at 
the  University  of  Minnesota  during  winter  quarter  of  1980.  A  paper-and-pencil 
pretest  was  administered  to  all  students  present  on  the  first  day  of  class. 
Computer-administered  conventional  posttests  were  given  before  classroom  mid¬ 
quarter  and  final  examinations  to  volunteer  students  who  were  awarded  extra¬ 
credit  points  for  their  participation. 

Design 


Tests .  There  were  two  different  tests  administered  at  various  times 
throughout  the  quarter.  Test  A  included  14  items  from  each  of  the  three  content 
areas  covered  in  class  lectures  before  the  midquarter  exam  (chemistry,  the  cell, 
and  energy).  Test  B  included  14  items  from  each  of  the  last  three  content  areas 
in  the  course  (genetics,  reproduction/ embryology ,  and  ecology). 

Experimental  groups.  The  data  collection  design  for  this  study  is  shown  in 
Figure  3.  Students  were  randomly  assigned  to  two  experimental  groups,  Groups  1 
and  2,  corresponding  to  the  groups  of  students  who  were  administered  one  of  two 
pretests — Tests  A  or  B,  respectively — on  the  first  day  of  class.  Group  3  in¬ 
cluded  students  who  were  absent  for  the  first  class  meeting  or  who  did  not  re¬ 
cord  on  their  answer  sheet  which  test  they  took. 

Figure  3 

Data  Collection  Design  for  Study  II 


Group  1 


Group  2 


Group  3 


Final 

Exam 

Posttest 


Test  A 


Test  B 


Test  B 


11 


During  the  two  weeks  Immediately  preceding  the  classroom  midquarter  exami¬ 
nation,  volunteer  students  were  administered  conventional  tests  on  the  computer 
( MQ  posttest).  All  these  students  were  administered  Test  A.  During  the  two 
weeks  Immediately  preceding  the  final  exam,  volunteer  students  were  administered 
conventional  tests  on  the  computer  (final  exam  posttest).  Students  in  Group  1 
were  readministered  Test  A;  students  in  Groups  2  and  3  were  administered  Test  B. 

All  item  responses  were  coded  as  correct,  incorrect,  or  missing.  Missing 
or  omitted  items  did  not  present  an  important  problem  for  this  set  of  data. 
Nevertheless,  the  same  1 5%-missing-data  criterion  was  used  here  as  was  used  in 
the  previous  study:  a  student's  response  protocol  was  deleted  from  the  data  set 
if  the  student  omitted  more  than  6  (i.e.,  15%  of  42)  items  on  any  one  test.  For 
the  students  included  in  the  analysis,  all  missing  data  were  coded  as  incorrect. 

Analyses 


Differences  in  achievement  level  estimates:  Test  A.  The  question  of 
whether  or  not  students'  achievement  level  estimates  on  Test  A  increased  from 
the  pretest  to  the  MQ  posttest  could  be  answered  by  examining  the  performance  of 
Group  1  students  on  Test  A  at  both  testing  occasions.  However,  the  number  of 
students  who  took  Test  A  both  times  was  small  (N  =  102)  compared  to  the  total 
number  of  .udents  who  took  Test  A  at  the  pretest  only  (N  =  276)  and  the  total 

number  of  students  who  took  Test  A  at  the  MQ  posttest  only  (N  *  302).  A  more 

powerful  test  of  the  difference  in  mean  achievement  levels  could  be  performed  by 
combining  the  data  from  all  students  who  took  Test  A  at  the  MQ  posttest  and  by 

comparing  their  performance  with  that  of  all  the  students  who  took  Test  A  as  a 

pretest. 

For  this  comparison,  it  was  necessary  to  assume  that  the  three  groups  of 
students  being  combined  at  the  MQ  posttest  were  equivalent.  Group  1  students 
were  administered  Test  A  both  at  the  pretest  and  at  the  MQ  posttest.  (Although 
Test  A  was  also  administered  again  at  the  final  exam  posttest,  the  number  of 
Group  1  students  who  returned  to  take  Test  A  at  the  final  exam  posttest  was  too 
small  for  meaningful  comparisons  to  be  made.  Hence,  Test  A  analyses  were  con¬ 
fined  to  the  pretest  and  MQ  posttest  administrations.)  Performance  of  Group  1 
students  on  Test  A  at  the  MQ  posttest  can  be  attributed  to  the  students'  under¬ 
lying  ability,  to  the  classroom  instruction,  and/or  to  the  repetition  of  items 
from  one  occasion  to  the  next.  Group  2  students,  on  the  other  hand,  were  admin¬ 
istered  Test  B  as  the  pretest  and  were  administered  Test  A  for  the  first  time  at 
the  MQ  posttest.  Performance  of  Group  2  students  on  Test  A,  then,  could  be  at¬ 
tributed  only  to  the  students'  underlying  ability  and/or  to  the  classroom  in¬ 
struction.  For  some  Group  3  students  (those  who  were  absent  on  the  first  day  of 
class),  performance  on  Test  A  could  also  be  attributed  to  their  underlying  abil¬ 
ity  and/or  to  the  classroom  instruction  only.  For  the  other  Group  3  students 
(those  who  did  not  record  which  pretest  they  took),  however,  Test  A  performance 
could  be  attributed  to  their  underlying  ability,  to  the  classroom  instruction, 
and/or  to  item  repetition.  Since  these  two  subgroups  of  Group  3  students  could 
not  be  identified  and  separated  for  analysis,  however,  Group  3  was  omitted  from 
the  following  comparison  for  Test  A. 

Because  students  were  randomly  assigned  to  Groups  1  and  2  on  the  first  day 
of  class,  and  because  classroom  instruction  was  the  same  for  all  students,  any 
differences  observed  between  Groups  1  and  2  on  their  performance  on  Test  A  would 


-  12  - 


reflect  a  repe tition-of-i terns  effect.  If  mean  test  scores  of  Groups  1  and  2 
were  not  significantly  different  from  each  other,  then  Groups  1  and  2  could  be 
combined  at  the  MQ  posttest  and  compared  with  all  students  from  Group  1  at  the 
pretest.  If  a  significant  repetition-of-items  effect  were  found,  then  subse¬ 
quent  analyses  should  be  performed  only  on  the  data  from  those  students  in  Group 
1.  Differences  between  the  scores  of  Group  1  and  Group  2  students  were  evaluat¬ 
ed  by  the  use  of  a  t  test  for  the  difference  between  two  independent  groups  and 
by  the  Kolmogorov-Smirnov  two-sample  test  for  the  difference  between  two  fre¬ 
quency  distributions. 

Analyses  relevant  to  the  issue  of  differences  in  achievement  scores  includ¬ 
ed  examination  of  the  frequency  distributions  and  summary  statistics  of  num¬ 
ber-correct  scores  and  the  distributions  of  item  difficulties  from  the  pretest 
and  the  MQ  posttest. 

Differences  in  the  structure  of  achievement:  Test  A.  The  question  of 
whether  or  not  there  were  qualitative  changes  in  the  nature  of  achievement  test 
scores  due  to  instruction  was  again  investigated,  as  in  Study  I,  by  analysis  of 
internal  consistency  reliability  coefficients  and  by  separate  principal-axes 
factor  analyses.  These  analyses  were  performed  separately  on  the  pretest  and  MQ 
posttest  data  interitem  correlation  matrices,  with  communalities  estimated  using 
an  iterative  procedure,  as  described  in  Study  I.  The  number  of  nonrandom  fac¬ 
tors  was  again  determined  by  comparing  the  results  of  the  factor  analyses  of 
Test  A  data  with  the  results  of  factor  analyses  of  random  data  based  on  items  of 
similar  difficulty. 

The  results  of  the  final  solutions  from  the  pretest  and  the  MQ  posttest 
were  then  compared  in  terms  of  the  numbers  of  factors  extracted  and  the  similar¬ 
ity  of  these  factors.  As  in  Study  I,  factor  similarity  was  indexed  by  the  root- 
mean-square  deviation,  the  product-moment  correlation  coefficient,  and  the  coef¬ 
ficient  of  congruence  between  the  factor  loadings  obtained  at  each  occasion  in 
comparison  with  values  obtained  from  two  sets  of  random  data. 

Differences  in  achievement  level  estimates:  Test  B.  The  question  of 
whether  or  not  students'  achievement  level  estimates  on  Test  B  increased  from 
the  pretest  to  the  final  exam  posttest  could  be  answered  by  examining  the  per¬ 
formance  of  Group  2  students  on  Test  B  at  both  testing  occasions.  However,  if 
no  significant  repetition-of-items  effect  was  found  for  Test  A  ( as  discussed 
above),  the  assumption  could  be  made  that  there  would  be  no  repetition-of-items 
effect  for  Test  B;  then  there  would  be  justification  for  combining  the  data  on 
Test  B  from  Groups  2  and  3  at  the  final  exam  in  order  to  conduct  a  more  powerful 
test  of  the  difference  between  mean  achievement  level  estimates.  Analyses  rele¬ 
vant  to  this  question  included  examination  of  the  frequency  distributions  and 
summary  statistics  of  number-correct  scores,  and  the  distributions  of  item  dif¬ 
ficulties  from  the  pretest  and  the  final  exam  posttest. 

Differences  in  the  structure  of  achievement:  Test  B.  As  described  above, 
the  internal  consistency  reliability  coefficient  (coefficient  alpha)  was  comput¬ 
ed  for  Test  B  at  the  pretest  and  at  the  final  exam  posttest.  Separate  principal 
axes  factor  analyses  were  also  performed  on  the  Test  B  data  and  on  parallel  ran¬ 
dom  data.  The  final  factor  solutions  of  Test  B  from  the  pretest  and  the  final 
exam  posttest  were  also  compared  in  terms  of  the  number  of  factors  extracted  and 
the  similarity  of  these  factors,  as  was  done  in  Study  I  and  for  Test  A  in  this 
study. 


-  13  - 


Results 


Effect  of  Item  Repetition 

The  effect  on  achievement  level  estimates  of  repeating  items  from  the  pre¬ 
test  to  a  posttest  was  evaluated  by  comparing  the  performance  of  students  in 
Groups  1  and  2  on  Test  A  administered  before  the  midquarter  exam  (MQ  posttest). 
There  were  102  students  from  Group  1  who  volunteered  to  take  the  MQ  posttest,  of 
which  98  met  the  15%-missing-data  criterion  and  were  retained  for  analyses.  For 
Group  2  these  figures  were  101  and  91,  respectively. 

Appendix  Table  C  presents  the  frequency  distributions  of  number-correct 
scores  for  Test  A  administered  at  the  MQ  posttest  to  students  from  Groups  1  and 
2;  the  frequency  polygons  are  displayed  in  Figure  4.  For  Group  1  the  mean  test 
score  was  24.19,  the  median  was  23.79,  and  the  standard  deviation  was  5.87.  For 
Group  2  these  statistics  were  22.59,  21.80,  and  6.26,  respectively.  A  _t  test  of 
the  difference  between  the  means  of  independent  groups  was  calculated  to  be 
1.98;  this  was  not  statistically  significant  at  £  ■  .01.  The  entire  frequency 
distributions  of  Groups  1  and  2  were  compared  by  using  a  Kolmogorov-Smirnov  two- 
sample  test;  the  statistic  calculated  was  equal  to  7.86,  which  was  not  statisti¬ 
cally  significant  at  p  =  .01. 


Figure  4 

Grouped  Frequency  Distributions  of  Number-Correct  Scores 
for  Biology  Test  A  Administered  at  MQ  Posttest 
for  Groups  1  and  2 


Number-Correct  Score 


Although  the  observed  differences  were  in  the  predicted  direction,  the  ef¬ 
fect  of  item  repetition  was  not  statistically  significant.  Hence,  the  question 
of  identifying  and  separating  the  two  subgroups  of  Group  3  was  no  longer  rele¬ 
vant,  and  the  Test  A  MQ  posttest  scores  of  students  in  Groups  1,  2,  and  3  were 
combined  for  comparison  with  the  scores  of  all  students  who  took  Test  A  on  the 
first  day  of  class.  Since  some  of  the  students  who  took  the  test  at  the  pretest 


did  not  take  it  at  the  posttest,  the  correlation  between  scores  at  pretest  and 
post  test  was  not  computed. 

Missing  Data 


There  were  276  students  who  were  administered  Test  A  at  the  pretest;  of 
these  272  met  the  15%-missing-data  criterion  and  were  retained  for  further  anal¬ 
yses.  The  combined  total  of  students  who  took  Test  A  at  the  MQ  posttest  was 
302,  and  283  of  these  were  retained  for  further  analyses. 

Because  there  was  no  effect  of  item  repetition  observed  for  Test  A,  the 
performance  of  Group  2  students  who  were  administered  Test  B  at  the  pretest  was 
compared  with  the  performance  of  students  from  both  Groups  2  and  3  who  were  ad¬ 
ministered  Test  B  at  the  final  exam  posttest.  There  were  283  students  who  were 
administered  Test  B  at  the  pretest,  of  which  277  met  the  15%-missing-data  crite¬ 
rion  and  were  retained  for  further  analyses.  A  total  of  169  students  took  Test 
B  at  the  final  exam  posttest,  and  163  of  them  were  retained  for  further  analy¬ 
ses. 

Differences  in  Achievement  Level  Estimates:  Test  A 


Total  score  differences.  Frequency  distributions  of  number-correct  scores 
on  Test  A  at  both  testing  occasions  are  presented  in  Appendix  Table  D;  the  fre¬ 
quency  polygons  appear  in  Figure  5.  Both  distributions  are  approximately  sym¬ 
metric,  with  the  distribution  of  MQ  posttest  scores  displaced  to  the  right.  The 
mean  of  the  pretest  scores  was  15.97,  with  a  standard  deviation  of  3.97.  For 
the  MQ  posttest  scores,  these  figures  were  23.46  and  5.99,  respectively.  The 
mean  score  difference  between  the  two  occasions  was  7.49.  Because  there  was 
some  overlap  between  the  students  in  the  two  groups,  the  groups  were  not  strict¬ 
ly  independent,  nor  were  they  strictly  dependent.  A  t^  test  for  the  difference 
between  two  independent  means,  although  technically  inappropriate,  would  yield  a 
conservative  test  of  the  significance  of  this  difference.  This  test  resulted  in 
t_  (df  =  553)  =  17.34,  p  <  .001. 

Item  difficulties.  The  frequency  distributions  of  item  difficulties  for 
Test  A  at  both  testing  occasions  are  given  in  Table  4.  As  indicated  earlier, 
the  pretest  was  somewhat  difficult:  74%  of  the  items  were  answered  correctly  by 
less  than  half  the  students,  and  no  item  was  answered  correctly  more  than  80%  of 
the  time.  After  instruction,  more  than  half  the  items  (23  of  42)  were  answered 
correctly  by  51%  to  90%  of  the  students,  although  five  items  were  answered  cor¬ 
rectly  less  than  30%  of  the  time. 

Differences  in  the  Structure  of  Achievement:  Test  A 


Internal  consistency  reliability.  Coefficient  alpha  for  Test  A  when  admin¬ 
istered  on  the  first  day  of  class  was  .490.  This  low  value  Indicates  that  the 
average  interitera  correlation  was  correspondingly  small.  After  instruction, 
coefficient  alpha  increased  to  .787  for  the  same  set  of  items.  Although  this 
value  is  not  high  for  a  42-item  test,  it  represents  a  substantial  increase  over 
the  value  obtained  at  the  pretest.  The  difference  between  these  two  figures  may 
indicate  that  the  items  were  functioning  as  a  set  differently  after  instruction 
than  they  were  before  instruction  and/or  it  may  reflect  the  Increase  in  the 
variance  of  the  number-correct  scores. 


Frequency 


-  15  ■ 


Figure  5 

Grouped  Frequency  Distributions  of  Number-Correct  Scores 
for  Biology  Test  A  Administered  at  Pretest  and  at  MQ  Posttest 


Table  4 

Frequency  Distributions  of  Item 
Difficulties  for  Biology  Test  A 
Administered  at  Pretest 
and  at  MQ  Posttest 


Range  of  Item 
Difficulty 

Number 

Pretest 

of  Items 
Posttest 

.00  - 

.10 

1 

1 

.11  - 

.20 

8 

1 

.21  - 

.30 

8 

3 

.31  - 

.40 

9 

7 

.41  - 

.50 

5 

7 

.51  - 

.60 

4 

5 

.61  - 

.70 

2 

5 

.71  - 

.80 

5 

5 

.81  - 

.90 

0 

8 

.91  - 

1.00 

0 

0 

Mean  Difficulty 

.38 

.56 

17 


Number  of  factors  extracted.  Appendix  Table  E  presents  the  eigenvalues  and 
percent  of  total  variance  accounted  for  by  the  first  15  factors  from  Test  A  and 
from  corresponding  random  data.  Figure  6a  presents  the  plots  of  eigenvalues 
versus  factors  extracted  from  Test  A  and  from  random  data  at  the  pretest,  and 
Figure  6b  presents  results  for  the  MQ  posttest.  Comparison  of  the  results  from 
Test  A  with  the  results  from  the  corresponding  random  data  revealed  that  there 
was  one  weak  factor  present  in  the  pretest  and  one  stronger  factor  present  in 
the  posttest. 

Factor  similarity.  Table  5  presents  the  factor  loadings  on  the  single  fac¬ 
tor  extracted  at  each  testing  occasion  from  Test  A  and  from  corresponding  random 
data.  Comparison  of  these  factor  loadings  reveals  that  the  loadings  from  the  MQ 
posttest  were,  in  general,  higher  than  those  from  the  pretest.  No  loading  from 
the  pretest  was  greater  than  .391,  and  nearly  two-thirds  of  the  factor  loadings 
(26  of  42)  were  less  than  .200.  For  the  MQ  posttest,  the  highest  loading  was 
.502,  but  81%  of  the  factor  loadings  (34  of  42)  were  greater  than  .200. 

This  result  can  also  be  seen  by  comparing  the  percentages  of  total  variance 
accounted  for  by  the  single  factor  at  each  administration.  For  the  pretest  that 
figure  was  3.96%  (as  compared  to  2.88%  for  the  random  data);  for  the  MQ  posttest 
the  factor  accounted  for  9.36%  of  the  total  variance  (as  compared  to  2.79%  for 
the  random  data).  Both  of  these  percentages  are  small  for  a  42-item  test,  indi¬ 
cating  that  the  factor  was  relatively  weak,  even  at  the  MQ  posttest. 

The  pattern  of  factor  loadings  did  not  appear  to  be  consistent  across  test 
administrations.  The  items  with  the  lowest  loadings  at  the  pretest  did  not 
emerge  as  the  items  with  the  lowest  loadings  at  the  MQ  posttest,  and  the  same 
was  true  for  the  items  with  the  highest  loadings. 

Table  6  presents  the  measures  of  factor  similarity  between  the  two  sets  of 
loadings  for  Test  A  and  the  corresponding  random  data.  The  root-mean-square 
deviation  between  the  two  sets  of  loadings  for  Test  A,  sensitive  to  differences 
in  levels  of  the  loadings,  was  .195,  a  high  value  when  considered  in  conjunction 
with  the  relatively  narrow  range  of  loadings  observed  in  these  data.  The  prod¬ 
uct-moment  correlation  coefficient  between  the  loadings,  sensitive  to  pattern 
differences,  was  a  low  .373.  The  coefficient  of  congruence  was  .780.  The  simi¬ 
larity  measures  obtained  from  the  random  data  were  .160,  .549,  and  .548,  respec¬ 
tively.  All  these  figures  reveal  that  the  factors  extracted  from  Test  A  on  the 
two  occasions  were  not  substantially  more  similar  than  were  factors  extracted 
from  randomly  generated  data. 

These  data  reveal,  then,  that  the  factor  extracted  from  Test  A  at  the  pre¬ 
test  differed  substantially  from  that  extracted  at  the  MQ  posttest.  Although 
there  was  a  sizeable  increase  in  the  number-correct  scores  after  instruction, 
there  was  a  corresponding  change  in  the  first  factor  underlying  the  item  respon¬ 
ses.  This  indicates  that  the  pretest  and  the  MQ  posttest  measured  quite  differ¬ 
ent  variables,  even  though  they  were  composed  of  exactly  the  same  items. 

Differences  in  Achievement  Level  Estimates:  Test  B 


Total  score  differences.  Frequency  distributions  of  number-correct  scores 
on  Test  B  at  both  testing  occasions  are  given  in  Appendix  Table  F;  their  fre¬ 
quency  polygons  are  presented  in  Figure  7.  The  distribution  of  final  exam  post- 


-  18  - 


Table  5 

Factor  Loadings  on  the  Single  Factor 


Extracted 

from  Biology  Test  A  at  Pretest  and  at 
and  from  Corresponding  Random  Data 

MQ  Posttest, 

Item 

Pretest 

Posttest 

Test  A 

Random  Data 

Test  B 

Random  Data 

1 

.068 

-.032 

.186 

.158 

2 

.024 

-.026 

.133 

-.205 

3 

.331 

-.245 

.161 

.051 

4 

.115 

.163 

.279 

.150 

5 

-.002 

-.238 

.276 

-.099 

6 

.206 

-.054 

.008 

.029 

7 

.280 

.191 

.372 

.121 

8 

.191 

-.246 

.333 

-.153 

9 

.272 

.096 

.408 

.120 

10 

.027 

-.005 

.367 

-.002 

11 

.291 

-.163 

.154 

-.154 

12 

.103 

-.035 

.207 

.011 

13 

.370 

.327 

.502 

.208 

14 

.391 

-.197 

.344 

-.223 

15 

.042 

.440 

.388 

.418 

16 

.273 

-.010 

.341 

.296 

17 

.133 

-.042 

.335 

.079 

18 

.239 

-.105 

.310 

-.162 

19 

.388 

.021 

.276 

.162 

20 

.205 

.362 

.410 

.222 

21 

.115 

-.059 

.316 

-.098 

22 

.223 

-.040 

.479 

-.161 

23 

.383 

.060 

.298 

.024 

24 

.245 

.067 

.373 

-.114 

25 

.052 

-.053 

.228 

.187 

26 

-.024 

-.116 

.246 

-.105 

27 

.039 

.091 

.478 

.083 

28 

.015 

-.094 

.143 

.060 

29 

.117 

.061 

.315 

.244 

30 

.343 

-.139 

.372 

-.224 

31 

.095 

.070 

.200 

.057 

32 

.194 

-.027 

.284 

-.154 

33 

.043 

.179 

.272 

.255 

34 

.059 

-.050 

.249 

.337 

35 

.096 

-.150 

.301 

.190 

36 

-.026 

.148 

.245 

.206 

37 

.221 

-.139 

.340 

-.021 

38 

.107 

-.185 

.227 

-.095 

39 

.106 

.282 

.241 

-.016 

40 

-.111 

-.344 

-.030 

.077 

41 

-.124 

.162 

.164 

-.041 

42 

.063 

.113 

.422 

.117 

Percent  of 

Total  Variance  3.96 

2.88 

9.36 

2.79 

19  - 


Table  6 

Measures  of  Factor  Similarity  Between  Factor 
loadings  for  Test  A  at  Pretest  and  at  MQ 
Posttest,  and  Between  Factor  Loadings 
from  Corresponding  Random  Data 


Similarity  Index 

Test  A 

Random  Data 

Root-Mean-Square- 

Deviation 

.195 

.160 

Pearson  Product-Moment 
Correlation 

.373 

.549 

Coefficient  of 
Congruence 

.780 

.548 

test  scores  is  approximately  symmetric,  while  that  of  the  pretest  scores  is 
slightly  positively  skewed.  The  mean  of  the  pretest  scores  was  15.18,  with 
standard  deviation  3.54.  For  the  final  exam  posttest  scores,  these  figures  were 
21.47  and  4.58,  respectively.  The  score  difference  between  the  mean  scores  on 
the  two  occasions  was  6.29.  As  before,  a  £  test  for  the  difference  between  two 
independent  means,  though  technically  inappropriate,  was  conducted  as  a  conser¬ 
vative  test  of  this  difference;  here,  t  (df  •  438)  *  16.15,  £  <  .001. 

Figure  7 

Grouped  Relative  Frequency  Distributions  of  Number-Correct  Scores 
for  Biology  Test  B  Administered  at  Pretest  and  at  Final  Exam  Posttest 


Item  difficulties.  The  frequency  distributions  of  item  difficulties  for 
Test  B  at  both  testing  occasions  are  given  in  Table  7.  As  was  observed  for  the 


20  - 


number-correct  scores,  the  pattern  of  item  difficulties  reveals  that  the  pretest 
was  somewhat  difficult:  74%  of  the  items  were  answered  correctly  by  less  than 
half  the  students,  and  only  two  items  were  answered  correctly  more  than  80%  of 
the  time.  At  the  end  of  the  course,  more  than  half  the  items  (22  of  42)  were 
answered  correctly  by  the  majority  of  students,  although  12  items  were  answered 
correctly  less  than  30%  of  the  time. 


Table  7 

Frequency  Distributions  of  Item 
Difficulties  for  Biology  Test  B 
Administered  at  Pretest  and 
at  Final  Exam  Posttest 


Range  of  Item 
Difficulty 

Number 

Pretest 

of  Items 
Posttes  t 

.00  -  .10 

4 

2 

.11  -  .20 

9 

3 

.21  -  .30 

8 

7 

.31  -  .40 

3 

4 

.41  -  .50 

7 

4 

.51  -  .60 

5 

2 

.61  -  .70 

2 

10 

.71  -  .80 

2 

5 

.81  -  .90 

2 

4 

.91  -1.00 

0 

1 

Mean  Difficulty 

.36 

.51 

Differences  in  the  Structure  of  Achievement:  Test  B 


Internal  consistency  reliability.  When  administered  at  the  pretest  on  the 
first  day  of  class,  coefficient  alpha  for  Test  B  was  .398,  increasing  to  .630 
when  administered  at  the  final  exam  posttest.  These  low  values  indicate  that 
the  average  interitem  correlation  coefficient  was  correspondingly  small.  Even 
though  both  reliability  coefficients  were  relatively  low,  the  fact  that  the  re¬ 
liability  coefficient  increased  from  .40  to  .63  may  be  an  indication  that  the 
items  were  functioning  as  a  set  differently  after  instruction  than  they  were 
before  instruction.  As  before,  however,  this  increase  may  simply  be  reflecting 
the  increase  in  the  variance  of  the  test  scores. 

Number  of  factors  extracted.  Appendix  Table  G  presents  the  eigenvalues  and 
percentages  of  total  variance  accounted  for  by  the  first  15  factors  extracted 
from  Test  B  and  from  corresponding  random  data.  Figure  8a  presents  the  plots  of 
these  eigenvalues  versus  factors  extracted  at  the  pretest,  and  Figure  8b  pre¬ 
sents  similar  data  from  the  final  exam  posttest.  Comparison  of  the  results  from 
the  real  data  with  thj  results  from  the  random  data  reveals  that  there  was  no 
factor  stronger  than  one  extracted  from  the  random  data  in  the  pretest,  but  one 
stronger  factor  was  extracted  from  Test  B  at  the  final  exam  posttest. 

Factor  similarity.  Table  8  presents  the  factor  loadings  on  the  single  fac¬ 
tor  extracted  at  each  testing  occasion  from  Test  B  and  from  corresponding  random 
data.  Comparison  of  these  factor  loadings  reveals  that  the  loadings  from  the 


21 


Figure  8 

Eigenvalues  for  the  First  15  Factors  Extracted  from  Biology  Test  B 
Administered  at  Pretest  and  at  Final  Exam  Posttest, 
and  from  Corresponding  Random  Data 

(a)  Pretest 


Table  8 


Factor  Loadings 

on  the  Single 

Factor  Extracted 

f  ron 

Biology  Test  B 

at  Pretest  and 

at  Final 

Exam  Posttest 

and  from 

Corresponding 

Random  Data 

Pretest 

Posttest 

Item 

Test  B 

Random  Data 

Test  B 

Random  Data 

1 

.131 

.088 

.295 

-.044 

2 

.073 

.087 

.310 

.377 

3 

-.023 

-.168 

.193 

.258 

4 

.218 

.122 

.416 

.098 

5 

.252 

-.286 

.137 

.113 

6 

.268 

.145 

.240 

.179 

7 

.191 

.145 

.256 

-.236 

8 

.127 

-.113 

.296 

.246 

9 

-.044 

.293 

.273 

-.066 

10 

.323 

-.320 

.255 

.296 

11 

.193 

.471 

.202 

.060 

12 

.164 

.117 

.311 

-.239 

13 

.393 

-.111 

.371 

.161 

14 

-.007 

-.136 

.438 

.030 

15 

.228 

-.085 

.261 

.045 

16 

.329 

-.099 

.301 

.284 

17 

.246 

-.252 

.310 

.193 

18 

.154 

.381 

.372 

-.073 

19 

.192 

-.098 

.241 

.006 

20 

-.027 

.341 

.193 

-.013 

21 

.231 

-.151 

.307 

.092 

22 

-.239 

-.156 

.268 

.411 

23 

.459 

.213 

.299 

.162 

24 

.062 

.067 

.079 

.140 

25 

.009 

.182 

.330 

-.037 

26 

.045 

-.101 

.174 

-.044 

27 

-.101 

.034 

-.112 

-.057 

28 

.130 

-.080 

.043 

.112 

29 

.296 

-.245 

.084 

.088 

30 

.215 

.077 

.155 

.328 

31 

.252 

.179 

.397 

.003 

32 

.278 

.020 

.177 

-.123 

33 

-.045 

.045 

-.112 

-.082 

34 

.028 

-.277 

.137 

.003 

35 

.012 

.384 

.165 

.093 

36 

.166 

-.012 

-.071 

.047 

37 

-.115 

-.034 

-.023 

-.026 

38 

.018 

.060 

-.002 

.009 

39 

.082 

.120 

.011 

.053 

40 

.040 

.109 

.178 

-.088 

41 

.013 

-.457 

.205 

-.015 

42 

-.058 

.510 

-.111 

-.071 

Percent 

of 

Total  Variance  3.69 

4.70 

5.96 

2.54 

23  - 


final  exam  posttest  were,  in  general,  slightly  higher  than  those  from  the  pre¬ 
test.  The  highest  pretest  loading  was  .459,  and  nearly  two-thirds  of  the  factor 
loadings  (27  of  42)  were  less  than  .200.  For  the  final  exam  posttest,  the  high¬ 
est  loading  was  .438,  but  more  than  half  of  the  factor  loadings  (23  of  42)  were 
greater  than  .200. 

This  result  can  also  be  seen  by  comparing  the  percentage  of  total  variance 
accounted  for  by  the  single  factor  extracted  at  each  administration.  For  the 
pretest,  that  figure  was  3.69%  (as  compared  to  4.70%  accounted  for  by  the  random 
factor);  for  the  final  exam  posttest,  the  factor  accounted  for  5.96%  of  the  to¬ 
tal  variance  (as  compared  to  2.54%  for  the  random  data).  Both  of  these  percent¬ 
ages  are  very  small,  indicating  that  the  factor  was  relatively  weak. 

The  pattern  of  factor  loadings  did  not  appear  consistent  across  test  admin¬ 
istrations.  The  items  with  the  lowest  loadings  at  the  pretest  did  not  necessar¬ 
ily  emerge  as  the  items  with  the  lowest  loadings  at  the  final  exam  posttest,  and 
the  same  was  true  for  the  items  with  the  highest  loadings. 

Table  9  presents  the  measures  of  factor  similarity  for  Test  B.  The  root- 
mean-square  deviation  between  the  two  sets  of  loadings  for  Test  B,  sensitive  to 
differences  in  levels  of  the  loadings,  was  .177,  a  high  value  when  considered  in 
conjunction  with  the  relatively  narrow  range  of  loadings  observed  in  this  data 
but  lower  than  the  .300  observed  for  the  two  sets  of  random  data.  The  product- 
moment  correlation  coefficient  between  the  loadings,  sensitive  to  pattern  dif¬ 
ferences,  was  a  low  .399  as  contrasted  with  £  =  -.327  for  the  random  data.  The 
coefficient  of  congruence  was  .697  for  Test  B  and  -.255  for  the  random  data. 
Although  the  comparison  of  the  similarity  measures  reveals  that  the  factor  load¬ 
ings  for  Test  B  were  more  congruent  than  the  corresponding  sets  of  random  data, 
the  degree  of  similarity  was  so  low  that  these  factors  could  not  justifiably  be 
considered  congruent. 


Table  9 

Measures  of  Factor  Similarity  Between  Factor 
Loadings  from  Test  B  at  Pretest  and  at  Final 
Exam  Posttest,  and  Between  Factor  Loadings 
from  Corresponding  Random  Data 


Similarity  Index 

Test  B 

Random  Data 

Root-Mean-Square 

Deviation 

.177 

.300 

Pearson  Product-Moment 
Correlation 

.399 

-.327 

Coefficient  of 
Congruence 

.696 

-.255 

These  data  reveal,  then,  that  the  factor  extracted  from  Test  B  at  the  pre¬ 
test  differed  from  the  factor  extracted  at  posttest.  As  was  observed  for  Test 
A,  there  was  a  sizeable  increase  in  the  number-correct  scores,  accompanied  by  a 
change  in  the  factor  underlying  the  item  responses.  This  indicates  that  the 
pretest  and  the  final  exam  posttest  were  measuring  quite  different  variables, 
even  though  they  were  composed  of  exactly  the  same  items. 


24 


Conclusions 


Differences  in  Achievement  Level  Estimates 


The  results  from  both  Test  A  and  Test  B  indicate  that  there  were  mean  dif¬ 
ferences  in  achievement  level  estimates  (number-correct  scores)  that  accompanied 
classroom  instruction.  On  the  average,  test  scores  Increased  after  relevant 
course  instruction;  for  these  data,  scores  increased  between  6  and  7.5  points  on 
a  42-item  test.  The  Increases  in  these  test  scores  were  not  attributable  to  the 
effect  of  item  repetition.  Although  the  differences  were  in  the  predicted  di¬ 
rection,  neither  a  £  test  nor  the  Ko lmogorov-Smi rnov  two-sample  test  were  sig¬ 
nificant  at  £  *  .01. 

Differences  in  the  Structure  of  Achievement 


There  were  substantial  differences  in  the  structure  of  item  responses  to 
the  items  on  both  biology  tests--Test  A  and  Test  B — from  the  pretest  to  the 
posttest.  Large  increases  in  the  internal  consistency  reliability  coefficient 
may  reflect  corresponding  changes  in  the  average  interitem  correlation  coeffi¬ 
cients.  That  is,  changes  in  the  way  the  items  functioned  together  as  a  set  were 
evident  after  instruction  took  place.  This  same  effect  was  observed  when  the 
factor  structures  of  the  tests  at  both  administrations  were  compared.  Although 
only  one  factor  was  extracted  at  each  administration  of  each  test,  the  factor  at 
each  pretest  was  very  weak  and  bore  little  relationship  to  the  factor  extracted 
later  in  the  course,  as  reflected  in  the  patterns  and  levels  of  the  factor  load¬ 
ings. 


DISCUSSION  AND  CONCLUSIONS 


The  results  of  these  studies  show  that  the  use  of  simple  difference  scores 
to  measure  change  in  classroom  achievement  may  not  be  appropriate  for  all  sub¬ 
ject  matter  areas.  The  use  of  simple  difference  scores,  or  some  derivative 
thereof,  assumes  that  there  is  only  a  quantitative  difference  between  pretest 
and  posttest  achievement  levels  due  to  a  course  of  instruction.  That  is,  the 
assumption  is  made  that  a  pretest  measures  a  baseline  amount  of  some  knowledge 
or  trait  and  that  classroom  instruction  results  in  increased  levels  of  the  same 
trait,  as  indicated  by  higher  scores  on  the  same,  or  a  similar,  test. 

This  assumption  was  supported  by  the  results  of  the  mathematics  data. 

There  was  a  large  and  statistically  significant  difference  observed  in  achieve¬ 
ment  test  scores  obtained  before  and  after  instruction.  That  the  same  trait  was 
being  measured  both  times  was  indicated  by  the  high  degree  of  similarity  of  the 
underlying  factor  structure  of  the  test  when  examined  at  both  points  in  time. 

The  only  change  observed  in  the  mathematics  test  scores  was,  then,  a  quantita¬ 
tive  one,  reflected  in  increases  in  mean  number-correct  score  after  classroom 
instruction  in  mathematics. 

The  results  were  quite  different  for  the  two  biology  tests  examined.  Fac¬ 
tor  analyses  of  the  pretests  revealed  the  presence  of  one  very  weak  factor  for 
each  pretest.  One  slightly  stronger  factor  also  emerged  at  each  of  the  post¬ 
tests,  but  there  was  very  little  correspondence  between  the  pretest  and  posttest 


25  - 


factors.  Even  though  mean  test  scores  increased  after  instruction,  there  was  a 
corresponding  difference  in  the  factors  underlying  test  performance.  The  change 
that  occurred  in  the  biology  test  scores,  then,  was  a  qualitative  one,  where  the 
tests  were  measuring  different  variables  before  and  after  instruction.  Evaluat¬ 
ing  gains  in  achievement  by  computing  pretest-posttest  difference  scores  cannot 
be  justified  under  these  circumstances. 

That  the  results  from  these  two  studies  are  different  has  important  bearing 
on  the  issue  of  program  evaluation  and  the  measurement  of  change.  The  question 
of  whether  the  difference  in  test  scores  that  follows  classroom  instruction  or 
program  participation  is  quantitative  or  qualitative  must  be  answered  before  any 
attempt  at  quantifying  change  can  legitimately  be  made.  For  some  courses  of 
instruction,  the  application  of  classical  change-score  methodology  may  be  de¬ 
fended  on  the  grounds  that  the  only  change  observed  was  quantitative;  for  oth¬ 
ers,  the  use  of  such  methodology  may  not  be  justified.  Clearly,  further  re¬ 
search  is  needed  to  define  those  areas  where  the  use  of  change  scores  or  their 
derivatives  may  be  warranted. 


< 


t 


I 

j 


REFERENCES 


Anastasi,  A.  The  influence  of  specific  experience  upon  mental  organization. 
Genetic  Psychology  Monographs,  1936,  1^8,  245-355. 

Cronbach,  L.  J.,  6  Furby,  L.  How  should  we  measure  "change" — or  should  we?  Psy¬ 
chological  Bulletin,  1970,  74^  68-80. 

Ferguson,  G.  A.  On  learning  and  human  ability.  Canadian  Journal  of  Psychology, 
1954,  8,  95-112.  ~ 

Fleishman,  E.  A.  A  factor  analysis  of  intra-task  performance  on  two  psychomotor 
tasks.  Psychometrika,  1953,  1J3,  45-55. 

Fleishman,  E.  A.  A  comparative  study  of  aptitude  patterns  in  unskilled  and 

skilled  psychomotor  performances.  Journal  of  Applied  Psychology,  1957,  41, 
263-272. 

Fleishman,  E.  A.  Abilities  at  different  stages  of  practice  in  rotary  pursuit 
performance.  Journal  of  Experimental  Psychology,  1960,  60,  162-171. 

Fleishman,  E.  A.,  &  Fruchter,  B.  Factor  structure  and  predictability  of  succes¬ 
sive  stages  of  learning  Morse  code.  Journal  of  Applied  Psychology,  1960, 

44,  97-101. 

Fleishman,  E.  A. ,  &  Hempel,  W.  E. ,  Jr.  Changes  in  factor  structure  of  a  complex 
psychomotor  task  as  a  function  of  practice.  Psychometrika,  1954,  1_9, 
239-252. 

Fleishman,  E.  A.,  &  Hempel,  W.  E.,  Jr.  The  relationship  between  abilities  and 

improvement  with  practice  in  a  visual  discrimination  reaction  task.  Journal 
of  Experimental  Psychology,  1955,  49^,  301-312. 

Games,  P.  A.  A  factorial  analysis  of  verbal  learning  tasks.  Journal  of  Experi¬ 
mental  Psychology,  1962,  fr3,  1-11. 

Garrett,  H.  E.  A  developmental  theory  of  intelligence.  American  Psychologist, 
1946,  1 ,  372-378.  . 

Greene,  E.  B.  An  analysis  of  random  and  systematic  changes  with  practice.  Psy¬ 
chometrika,  1943,  37-52. 

Harman,  H.  H.  Modern  factor  analysis  (  3rd  ed.).  Chicago:  University  of  Chicago 
Press,  1976. 

Kingsbury,  G.  G.,  &  Weiss,  D.  J.  Effect  of  polnt-ln-tlme  in  instruction  on  the 

measurement  of  achievement  ( Research  Report  79-4).  Minneapolis :  University 
of  Minnesota,  Department  of  Psychology,  Psychometric  Methods  Program,  August 
1979. 

Lord,  F.  M.  Elementary  models  for  measuring  change.  In  C.  W.  Harris  (Ed.), 
Problems  in  measuring  change.  Madison:  University  of  Wisconsin,  1963. 


Nesselroade,  J.  R. ,  &  Baltes,  P.  B.  On  a  dilemma  of  comparative  factor  analy¬ 
sis:  A  study  of  factor  matching  based  on  random  data.  Educational  and  Psy¬ 
chological  Measurement,  1970,  30,  935-948. 

Nie,  N.  H. ,  Hull,  C.  H. ,  Jenkins,  J.  G. ,  Steinbrenner ,  K. ,  &  Bent,  D.  H.  Statis¬ 
tical  package  for  the  social  sciences  (2nd  ed.).  New  York:  McGraw-Hill. 
1975. 

Querishi,  M.  Y.  Patterns  of  psycholinguistic  development  during  early  and  middle 
childhood.  Educational  and  Psychological  Measurement,  1967,  27^  353-365. 

Reinert,  G.  Comparative  factor  analytic  studies  of  intelligence  throughout  the 
human  life  span.  In  L.  R.  Goulet  &  P.  B.  Baltes  (Eds.),  Life-span  develop¬ 
mental  psychology:  Research  and  theory.  New  York:  Academic  Press,  1970. 

Sullivan,  J.  P. ,  &  Moran,  L.  J.  Association  structures  of  bright  children  at  age 
six.  Child  Development,  1967,  38^,  793-800. 

Swartz,  J.  D. ,  &  Moran,  L.  J.  Association  structures  of  bright  children  at  ages 
nine  and  twelve.  Multivariate  Behavioral  Research,  1968,  3^  189-198. 

Wohlwill,  J.  F.  Methodology  and  research  strategy  in  the  study  of  developmental 
change.  In  L.  R.  Goulet  &  P.  B.  Baltes  (Eds.),  Life-span  developmental  psy¬ 
chology:  Research  and  theory.  New  York:  Academic  Press,  1970. 

Woodrow,  H.  The  relation  between  abilities  and  Improvement  with  practice.  Jour¬ 
nal  of  Educational  Psychology,  1938,  29,  215-230. 

Woodrow,  H.  The  application  of  factor-analysis  to  problems  of  practice.  Journal 
of  General  Psychology,  1939,  2Ji_,  457-460.  (a) 

Woodrow,  H.  Factors  in  improvement  with  practice.  Journal  of  Psychology,  1939, 
7,  55-70.  (b) 

Woodrow,  H.  The  relation  of  verbal  ability  to  improvement  with  practice  in  ver¬ 
bal  tests.  Journal  of  Educational  Psychology,  1939,  30,  179-186.  (c) 


i 


-  28  - 


Appendix:  Supplementary  Tables 


Table  A 

Frequency  Distributions  of  Number-Correct  Scores 
for  APT  Pretest  and  Posttest  (N*220) 


Score 

Pretest 

Posttest 

Frequency 

Percent 

Cumulative 

Percent 

Frequency 

Percent 

Cumulative 

Percent 

35 

0 

0 

100.0 

4 

1.8 

100.0 

34 

1 

.5 

100.0 

20 

9.1 

98.2 

33 

4 

1.8 

99.5 

28 

12.7 

89.1 

32 

7 

3.2 

97.7 

29 

13.2 

76.4 

31 

7 

3.2 

94.5 

19 

8.6 

63.2 

30 

13 

5.9 

91.4 

25 

11.4 

54.5 

29 

5 

2.3 

85.5 

16 

7.3 

43.2 

28 

13 

5.9 

83.2 

19 

8.6 

35.9 

27 

5 

2.3 

77.3 

11 

5.0 

27.3 

26 

8 

3.6 

75.0 

8 

3.6 

22.3 

25 

14 

6.4 

71.4 

7 

3.2 

18.6 

24 

20 

9.1 

65.0 

7 

3.2 

15.5 

23 

17 

7.7 

55.9 

6 

2.7 

12.3 

22 

10 

4.5 

48.2 

1 

0.5 

9.5 

21 

14 

6.4 

43.6 

5 

2.3 

9.1 

20 

16 

7.3 

37.3 

4 

1.8 

6.8 

19 

6 

2.7 

30.0 

1 

0.5 

5.0 

18 

11 

5.0 

27.3 

3 

1.4 

4.5 

17 

11 

5.0 

22.3 

1 

0.5 

3.2 

16 

9 

4.1 

17.3 

0 

0.0 

2.7 

15 

7 

3.2 

13.2 

0 

0.0 

2.7 

14 

2 

0.9 

10.0 

1 

0.5 

2.7 

4 

1.8 

9.1 

2 

0.9 

2.3 

12 

4 

1.8 

7.3 

1 

0.5 

1.4 

11 

7 

3.2 

5.5 

0 

0.0 

0.9 

1 

0.5 

2.3 

1 

0.5 

0.9 

9 

0 

0.0 

1.8 

0 

0.0 

0.5 

8 

3 

1.4 

1.8 

1 

0.5 

0.5 

7 

1 

0.5 

0.5 

0 

0.0 

0.0 

Mean 

22.26 

28.91 

SD 

5.97 

4.88 

Median 

22.74 

30.10 

Mode 

24 

32 

Table  B 

Eigenvalues  and  Percent  of  Total  Variance 
Accounted  for  by  First  15  Factors  Extracted  from  the  APT 
at  Pretest  and  at  Posttest,  and  from  Corresponding  Random  Data 


_ Pretest _  _ Posttest _ 

_ APT _  Random  Data  _ APT _  Random  Data 

Eigen-  %  Total  Eigen-  %  Total  Eigen-  X  Total  Eigen-  %  Total 


Factor 

Value 

Variance 

Value 

Variance 

Value 

Variance 

Value 

Variance 

1 

5.350 

15.3 

1.545 

4.4 

5.590 

16.0 

1.419 

4.1 

2 

1.555 

4.4 

1.308 

3.7 

1.605 

4.6 

1.253 

3.6 

3 

1.539 

4.4 

1.229 

3.5 

1.337 

3.8 

1.161 

3.3 

4 

1.209 

3.5 

1.139 

3.3 

1.171 

3.3 

1.134 

3.2 

5 

1.086 

3.1 

1.029 

2.9 

1.034 

3.0 

1.052 

3.0 

6 

1.016 

2.9 

.993 

2.8 

1.006 

2.9 

1.023 

2.9 

7 

.942 

2.7 

.890 

2.5 

.986 

2.8 

.896 

2.6 

8 

.892 

2.5 

.865 

2.5 

.939 

2.7 

.828 

2.4 

9 

.876 

2.5 

.822 

2.3 

.839 

2.4 

.814 

2.3 

10 

.794 

2.3 

.767 

2.2 

.797 

2.3 

.790 

2.3 

11 

.739 

2.1 

.745 

2.1 

.756 

2.2 

.770 

2.2 

12 

.666 

1.9 

.692 

2.0 

.675 

1.9 

.732 

2.1 

13 

.607 

1.7 

.634 

1.8 

.660 

1.9 

.702 

2.0 

14 

.597 

1.7 

.600 

1.7 

.604 

1.7 

.666 

1.9 

15 

.553 

1.6 

.566 

1.6 

.533 

1.5 

.600 

1.7 

Table  C 

frequency  Distribution  of  Number-Correct  Scores  for 
Biology  Test  A  at  MQ  Posttest  for  Students  in  Groups  1  and  2 


Score 

Group  1  ( N-98) 

Group  2  (N-91) 

Frequency 

Percent 

Cumulative 

Percent 

Frequency 

Percent 

Cumulative 

Percent 

41 

1 

1.0 

100.0 

0 

0.0 

100.0 

40 

0 

0.0 

99.0 

0 

0.0 

100.0 

3* 

0 

0.0 

99.0 

0 

0.0 

100.0 

38 

0 

0.0 

99.0 

1 

1.1 

100.0 

37 

2 

2.0 

99.0 

1 

1.1 

98.9 

36 

1 

1.0 

96.9 

0 

0.0 

97.8 

35 

0 

0.0 

95.9 

1 

1.1 

97.8 

34 

1 

1.0 

95.9 

3 

3.3 

96.7 

33 

2 

2.0 

94.9 

1 

1.1 

93.4 

32 

3 

3.1 

92.9 

2 

2.2 

92.3 

31 

2 

2.0 

89.8 

4 

4.4 

90.1 

30 

5 

5.1 

87.8 

1 

1.1 

85.7 

29 

6 

6.1 

82.7 

3 

3.3 

84.6 

28 

4 

4.1 

76.5 

1 

1.1 

81.3 

27 

5 

5.1 

72.4 

6 

6.6 

80.2 

26 

6 

6.1 

67.3 

5 

5.5 

73.6 

25 

6 

6.1 

61.2 

5 

5.5 

68.1 

24 

7 

7.1 

55.1 

2 

2.2 

62.6 

23 

10 

10.2 

48.0 

6 

6.6 

60.4 

22 

7 

7.1 

37.8 

5 

5.5 

53.8 

21 

9 

9.2 

30.6 

6 

6.6 

48.4 

20 

3 

3.1 

21.4 

6 

6.6 

41.8 

19 

5 

5.1 

18.4 

4 

4.4 

35.2 

18 

2 

2.0 

13.3 

9 

9.9 

30.8 

17 

3 

3.1 

11.2 

5 

5.5 

20.9 

16 

1 

1.0 

8.2 

5 

5.5 

15.4 

15 

1 

1.0 

7.1 

3 

3.3 

9.9 

14 

1 

1.0 

6.1 

1 

1.1 

6.6 

13 

1 

1.0 

5.1 

1 

1.1 

5.5 

12 

2 

2.0 

4.1 

1 

1.1 

4.4 

11 

1 

1.0 

2.0 

2 

2.2 

3.3 

10 

0 

0.0 

1.0 

1 

1.1 

1.1 

9 

1 

1.0 

1.0 

0 

0 

0.0 

Mean 

24.19 

22.59 

SD 

5.87 

6.26 

Median 

23.79 

21.80 

Mode 

23 

18 

-  31  - 


Table  D 

Frequency  Distribution  of  Number-Correct  Scores 


for 

Biology  Test 

A  at  Pretest 

and  at  MQ 

Posttest 

Pretest  (N=272) 

Posttest  (N-283) 

Cumulative 

Cumulative 

Score 

Frequency 

Percent 

Percent 

Frequency 

Percent 

Percent 

41 

0 

0.0 

100.0 

1 

0.4 

100.0 

40 

0 

0.0 

100.0 

0 

0.0 

99.6 

39 

0 

0.0 

100.0 

0 

0.0 

99.6 

38 

0 

0.0 

100.0 

1 

0.4 

99.6 

37 

0 

0.0 

100.0 

4 

1.4 

99.3 

36 

0 

0.0 

100.0 

2 

0.7 

97.9 

35 

0 

0.0 

100.0 

3 

1.1 

97.2 

34 

0 

0.0 

100.0 

4 

1.4 

96.1 

33 

0 

0.0 

100.0 

5 

1.8 

94.7 

32 

0 

0.0 

100.0 

6 

2.1 

92.9 

31 

0 

0.0 

100.0 

9 

3.2 

90.8 

30 

1 

0.0 

100.0 

8 

2.8 

87.6 

29 

0 

0.0 

99.6 

15 

5.3 

84.8 

28 

1 

0.4 

99.6 

9 

3.2 

79.5 

27 

1 

0.4 

99.3 

17 

6.0 

76.3 

26 

0 

0.0 

98.9 

16 

5.7 

70.3 

25 

2 

0.7 

98.9 

23 

8.1 

64.7 

24 

5 

1.8 

98.2 

15 

5.3 

56.5 

23 

8 

2.9 

96.3 

24 

8.5 

51.2 

22 

6 

2.2 

93.4 

15 

5.3 

42.8 

21 

8 

2.9 

91.2 

19 

6.7 

37.5 

20 

9 

3.3 

88.2 

14 

4.9 

30.7 

19 

25 

9.2 

84.9 

10 

3.5 

25.8 

18 

23 

8.5 

75.7 

16 

5.7 

22.3 

17 

34 

12.5 

67.3 

13 

4.6 

16.6 

16 

23 

8.5 

54.8 

9 

3.2 

12.0 

15 

24 

8.8 

46.3 

7 

2.5 

8.8 

14 

30 

11.0 

37.5 

5 

1.8 

6 .  A 

13 

25 

9.2 

26.5 

3 

1.1 

4.6 

12 

13 

4.8 

17.3 

3 

1.1 

3.5 

11 

15 

5.5 

12.5 

4 

1.4 

2.5 

10 

7 

2.6 

7.0 

1 

0.4 

1.1 

9 

5 

1.8 

4.4 

2 

0.7 

0.7 

8 

3 

1.1 

2.6 

0 

0.0 

0.0 

7 

3 

1.1 

1.5 

0 

0.0 

0.0 

6 

0 

0.0 

0.4 

0 

0.0 

0.0 

5 

0 

0.0 

0.4 

0 

0.0 

0.0 

4 

1 

0.4 

0.4 

0 

0.0 

0.0 

Mean 

15.97 

23.46 

SD 

3.97 

5.99 

Median 

15.94 

23.35 

Mode 

17 

23 

32  - 


Table  E 

Eigenvalues  and  Percent  of  Total  Variance  Accounted  for  by 
First  15  Factors  Extracted  from  Biology  Test  A  at  Pretest 
and  at  MQ  Posttest  and  Corresponding  Random  Data 


-  33  - 


Table  F 

Frequency  Distribution  of  Number-Correct  Scores 
for  Biology  Test  B  at  Pretest  and  at  Final  Exam  Posttest 


Score 

Pretest  (N“277) 

Posttest  ( N»163) 

Frequency 

Percent 

Cumulative 

Percent 

Frequency 

Percent 

Cumulative 

Percent 

33 

0 

0.0 

100.0 

1 

0.6 

100.0 

32 

1 

0.4 

100.0 

2 

1.2 

99.4 

31 

0 

0.4 

100.0 

3 

1.8 

98.2 

30 

0 

0.4 

100.0 

1 

0.6 

96.3 

29 

0 

0.4 

100.0 

5 

3.1 

95.7 

28 

0 

0.4 

100.0 

8 

4.9 

92.6 

27 

0 

0.4 

100.0 

8 

4.9 

87.7 

26 

0 

0.0 

100.0 

6 

3.7 

82.8 

25 

1 

0.4 

99.6 

5 

3.1 

79.1 

24 

2 

0.7 

99.3 

8 

4.9 

76.1 

23 

4 

1.4 

98.6 

13 

8.0 

71.2 

22 

4 

1.4 

97.1 

16 

9.8 

63.2 

21 

6 

2.2 

95.7 

17 

10.4 

53.4 

20 

10 

3.6 

93.5 

15 

9.2 

42.9 

19 

12 

4.3 

89.9 

10 

6.1 

33.7 

18 

27 

9.7 

85.6 

12 

7.4 

27.6 

17 

31 

11.2 

75.8 

10 

6.1 

20.2 

16 

29 

10.5 

64.6 

10 

6.1 

14.1 

15 

30 

10.8 

54.2 

5 

3.1 

8.0 

14 

29 

10.5 

43.3 

3 

1.8 

4.9 

13 

23 

8.3 

32.9 

3 

1.8 

3.1 

12 

22 

7.9 

24.5 

0 

0.0 

1.2 

11 

21 

7.6 

16.6 

2 

1.2 

1.2 

10 

16 

5.8 

9.0 

0 

0.0 

0.0 

9 

7 

2.5 

3.2 

0 

0.0 

0.0 

8 

2 

0.7 

0.7 

0 

0.0 

0.0 

Mean 

15.18 

21.47 

SD 

3.54 

4.58 

Median 

15.12 

21.18 

Mode 

17 

21 

34  - 


Table  G 

Eigenvalues  and  Percent  of  Total  Variance  Accounted  for  by  First 
15  Factors  Extracted  from  Biology  Test  B  at  Pretest  and  at  Final  Exam 
Posttest  and  from  Corresponding  Random  Data 


Pretest  Final  Exam  Posttest 


Factor 

Test  B 

Random  Data 

Test  B 

Random 

Data 

Eigen¬ 

value 

%  Total 
Variance 

Eigen¬ 

value 

%  Total 
Variance 

Eigen¬ 

value 

%  Total 
Variance 

Eigen¬ 

value 

%  Total 
Variance 

1 

2.043 

4.9 

2.440 

5.8 

3.124 

7.4 

1.810 

4.3 

2 

1.551 

3.7 

1.448 

3.4 

1.920 

4.6 

1.678 

4.0 

3 

1.345 

3.2 

1.190 

2.8 

1.590 

3.8 

1.550 

3.7 

4 

1.204 

2.9 

1.146 

2.7 

1.480 

3.5 

1.513 

3.6 

5 

1.152 

2.7 

1.098 

2.7 

1.383 

3.3 

1.466 

3.5 

6 

1.065 

2.5 

1.053 

2.5 

1.309 

3.1 

1.370 

3.3 

7 

.932 

2.2 

.999 

2.4 

1.284 

3.1 

1.305 

3.1 

8 

.911 

2.2 

.929 

2.2 

1.167 

2.8 

1.234 

2.9 

9 

.887 

2.1 

.920 

2.2 

1.151 

2.7 

1.215 

2.9 

10 

.835 

2.0 

.852 

2.0 

1.059 

2.5 

1.105 

2.6 

11 

.796 

1.9 

.770 

1.8 

.978 

2.3 

1.030 

2.5 

12 

.781 

1.9 

.739 

1.8 

.964 

2.3 

.966 

2.3 

13 

.747 

1.8 

.702 

1.7 

.927 

2.2 

.895 

2.1 

14 

.709 

1.7 

.684 

1.6 

.911 

2.2 

.857 

2.0 

15 

.685 

1.6 

.668 

1.6 

.819 

2.0 

.803 

1.9 

Distribution  List 


■  •  '•  y 

Dr  .  r,l  Ail  i  n 

K:.vy  Per'-onni  1  RAD  C>  nt.er 
*'  *n  Di  •  ,’o ,  oa  r  7 1‘>? 

Meryl  **.  P  .per 

nprpo 
Cod:"  P'r'‘ 

<';»r  Ds>fo,  DA  c.1  i<-p 

Dr.  J.-cp  R.  forsMng 
Provost  *  fr  -domic  P*  ;  n 
U.F.  '.'.Tv)  Post  gr*  'lust  •  School 
Nortorey,  CA  0"O.*iD 

CM"f  of  n?v;]  Educ.  *  ion  and  Training 
Li.,  son  Of  fir/' 

Air  Force  Hunan  Fe source  Labor; tory 
Flying  Tr .'ining  Division 
W  f  t.LTA.'T  A  Ffi,  -7 

rpR  *'i  kr  Curr-n 
''ffir.e  Of  K«vp1  Rose.' ■*--» 

"D1'  v .  Cn inny  Ft-  . 

Ccie  -,‘,r 

A r 1 inp ton ,  VA  rn?17 

DR.  PAT  FEDERICO 

NAVY  PERSONNEL  RAD  CFNTE* 

FAN  DIEGO.  01  c-*  1 5P 

Mr.  Paul  Folry 

Navy  Personnel  RAD  Center 

5.  >n  Diego,  CA  7?16? 

Dr .  John  Ford 

•Ovy  Personnel  RAD  Center 

Frn  Di^go,  CA  optns 

nr.  Pr.trinP  P.  Harrison 
Psychology  0 ou**5c  Di  •“f'dor 
LF A PKR^MTP  '*  LAW  DEPT,  '"b' 

DTV.  OF  PROFFSr TONAL  DFVELpPM^EN'r 
’J.r.  t:AV1L  s CADFMY 
A NMAPOLTS ,  MD  ptUOP 

Dr.  Nornrn  J.  Kerr 

OMef  of  Hava!  Technical  Training 

*'avfl  Air  "tat  ion  Memphis  (T) 

Milliner  on.  TP  ’Roc.i- 

Dr  .  Will  i  »n  L.  Maloy 
Principal  Civilian  Advisor  for 
Education  and  Training 
Naval  Training  Command ,  Code  DOA 
Ponsarol^,  FL  7?5p° 

CAPT  Pif>rrd  L.  MarMn,  tJSN 
Prosp/'p’  ive  Comm''ndln£  Of  fir/ r 
UTT  c->r]  Vi  nr,or  (pV»U  "D) 

N'wpor*  Shipbuilding  and  Drydor!<  Co 

*' -v-por*  *irv»s ,  Vf 

.  .1  n  Me  Pr  t  *tf 
*:  -  v y  P-  r  .vnnr]  P*«D  Center 
c  *r  Pi  M*0.  CA  ':?lr? 


1 


1 


f- 


1 


1 


1 


1 


1 


Dr  Wi  1 !  :  '*nj  M-ont.gue 
Navy  P'-rronnrl  LAD  Cf-nt  *r 
Dan  Diego,  Cs 


1  Dr  .  f  •  rn* r  i  hm!  rd  .  '  u 
f  1  •  v  y  Pi-  r  r  o  r.  n  e  1  P1r'  r*r**-r 
S-r  Pi /-r.c  .  0*  '  0ir  • 


“r  .  l.’il  1  i?n  “ordbro'  W 

'nst  rue t  ion.  1  Proj*r;n  D»  v.-lopru-nt  1 

n-*g.  nD 

NFT-PDCD 

Gro*  ♦  Lakes  Naval  Tr/ining  fi  n' 

TL  f-'T*? 

"Td  0.  T.  Yellrn  7 

T/*chnir.-’!  Tnformat  ion  Of  fir* 

NAVY  PERc'riNNFL  RAD  C'FMTFR 
DAM  DTFGp,  CA  9?V  ? 

Library.  Coer  P?OTL  ] 

Mr.vy  Personnel  R* D  rrntnr 
Fan  Fiego.  pA  n'Nirp 

Tcctnicr1  Director 

Nrvy  Personnel  F D  Center 

Fan  Diego.  CA  7 

Conriiand ing  Of fierr 

Nav; 1  Pf search  Labor; tory 

Coda  ?fi? 7  7 

Washington,  DO  ?C™0 

Psychologist 
OMR  Branch  pfficr 

PI dg  1 !*»,  Feet  ion  D  1 

F6F  Summer  Ft r ret 
Poston,  MA  0?21 0 

Psychologist 
ONR  Drench  Office 
r?6  F.  Clark  Street 
Chicago.  IL  f0f>06 

Office  of  Naval  Research 
Code 

pop  N.  Quincy  Fftrrrt  ? 

Arlington,  VA  22217 

Office  of  Navel  Research 
Code  A'M 

BOO  N.  Cuincy  Street  1 

Arlington,  VA  ???17 

Personnel  **  Training  Research  Progr, 
(Code 

Office  of  Naval  Researrh 
Arlington,  VA 

Psychol ogi st 
ONR  Branch  Office 
IP^C  East  Green  Street 
Pnsrden;  ,  C*  91101 


rr  .  o i  t  h  Sr  t  1  •  f"  . 

F<  :.e  r'-f  ,  D-'v/ 1  opr>*  r*  , 

*’  -  • 

Fiv  ’  Fdu<  t  ion  <  r*  T r-;r;rr  r  "*  • 
»»/*'.  r<TV-  rC.’  ,  FL.  ’  r  r  • 

Dr  .  Pot.-r  t  G  .  Drr, ; '  ' 

•  Off  i  r  •'  of  oft  .  '  r  \  i-  :  ■  '  - 

V'-fhinf  ’  or. ,  DC  .  r  Cr 
Dr.  Alfred  F.  Fwodf 

Tr.  ining  Analysis  A  Fv  1  lm*  :m.  C-r  e-^t 
(TAFG) 

Drpt .  of  **■'  f.'csvy 
Or  1 nndo ,  FL  ‘ OF l “ 

Dr.  Richard  Sorensen 
Navy  Pcrsornrl  F*-D  g.-nt/r 
Frn  Diego.  Ct  9?lr? 

1».  Gary  Thomson 

Naval  OpL.'n  Systems  Center 

Codr  7 1  ■? 

Frn  Oirgo,  CA  P71D? 

Roger  Wr issinger-B:  yl on 

DeparTmr-nt  of  Administrative  Sciences 
N'ival  Postgraduate  Fchool 
Monterey,  CA  9?n«e 

Dr.  Ronald  V’MtziTKn 
Cod-'  WZ 

Dcp<- rtrnent  of  Admi  nistrat  i  ve  F'Cjenres 
U.  F.  Naval  Postgraduate  Fctoo1 
Monterey,  CA  Q^oiic 

Dr.  Robert.  Wisher 
Code  T9 

N;jvy  Personnel  PAD  Center 
Scn  Diego.  CA  ??16P 

DR.  MARTTN  F.  WISKOFF 
NAVY  PERFONNFL  RA  D  CENTFP 
FAN  DTFG0 .  CA  9?lr>? 

•ms 

Mr  John  H.  Wolf*' 

Code  P 7 1 C 

U.  5.  Navy  Personnel  Research  and 
Development  Center 
Fan  Diego,  CA  ^216? 

Army 


Office  of  fhc  Chief  of  Nnval  Operations 

r-SFrr^p  ^r,10^"'  *  nU:11r5  rri,nC''T.rhnlc=I  Mr^.or 

W$sti1  n<*on,  'pc  u-  er">'  Tor 

Behavior  r\  ;.nd  Social  Sri  one 

LT  Frrril  C.  P.'tlo,  ITT,  U^N  (Ph.tl)  c°r’  IlSfU-ovr  l.iTO 
rioli-clion  «nd  Tminirig  Rpspprrh  niv  A 1  f*i  ’ndr  i  ;■ ,  VA  .  • 

Nuinr.n  Performance  Sciences  Dept. 

Navi-1  Aerospare  Medical  F«*3«vrr*h  l;  bore? 

PensacolT,  FL  '.?r»pF 


Dr.  ►‘yrt  n  Fis^M 

U.f.  Amy  Res*'  r-'t*  InsU'ut-  for  •>" 
Po". '1  ;nd  Peh'viora’  Sciences 
5'1  41  Eisenhower  Avrni:» 

A1  cx  I r  l .  .  V'  T?  -  1 

*V  .  Viet 1  y  ’pi  *  r 
>*  <*  n pt‘v  ogernprvj  *SS7TT,.VTE 
t'Ti  f :r=FrHr',*FR  j»ve:»>F 
AL^X*  NTR  ’  *  ,  VJ 

Dr.  fil-or  ?.  F-tz 

■*>  ;n‘.n?  T'Chnir*l  A r*.: 

IJ.r.  Arry  Fes'.rch  *nV.  ;tu*e 
E:s.*rf*ov«r  Av*»nue 
AUx-ndri'  t  VA  22"“ 

Dr.  u.’ro Id  F.  ’W*-;;,  Jr. 

Attn:  PER' -OX 
Army  Research.  Tnstitute 
cnn  Eisentower  Avenue 
Alexandria,  VA. 

DR.  JPMES  L.  PAHEY 
r;.f,  ARMY  RESEARCH  IN5TTTtTE 
5Cn  EICENHCWFR  avenue 
ALEXANDRIA.  VA  P 2 ; " " 

Mr.  Rober*  Ross 

U.S.  A-ny  Fes*- irct  Tnsti*uhe  for  th>* 
focirl  and  Erhrviorsjl  Sciences 
5nP 1  Eisenhower  Avenue 
Alexandria.  VA 

Dr.  Robert  Scsnor 

U.  c.  Amy  Rese;r~h.  !r.sti‘ut*  for  the 
Pehavior»!  and  Socirl  Fciences 
5CC1  Eisenhower  Avpnu* 

A’txandria,  VA  22?'4? 

C ommandant 

US  Amy  Institute  of  Administration 
Attn:  Dr.  Sherrill 
FT  Benjamin  Harrison,  IN  4f25* 

Dr.  Joseph  Ward 
U.F.  Army  P»s«arch  Institute 
50^1  Eisenhower  Avenue 
Alexandria,  VA  ???’• 

Air  Force 


Air  Force  Hunan  Resources  Lcb 
AFHPL/HPD 
Brooks  AFP.  TX 

U.f.  Air  Force  D.fftc-  of  relent  if ic 
F'9S’f.rch 

L i f e  Coieroes  Directorate,  NL 
Eollinc  Air  Force  Peso 
Washington,  DC 

Dr.  F'-rl  A.  Alluisi 
HC.  AFHRL  '  A FfC', 

F-ookr,  AFP.  TX  7<?2'c- 

D-.  Alfred  F.  Frr-gly 
CF'^P/VL.  5!dg.  RtCl 

rr  *  *  1  -  -j  AFP  }p 

\  DC  2r^? 

Dr  .  -  7  1  *-■  V  c  H  hd.-d 

Pr-^r/r  *Mn-.'*r 

Li  f  *  *■’. ;  "0-5  r i  r  •  - r o r '-* * ■  t 

AF'DO 

no:  :  in,:  PFf  .  r>r  '7 


1  Dr.vi-1  R.  Hi.n!  r  r 
afhrl/moah 

Brooks  AFP,  TX  *T°7T^ 

1  F ^ set  r  -h  and  *•■*.  surnc  nt  Division 

Res  .  mb  Pr r nef  .  AFMpr/VFCYPf. 
P«Mol  ph  AFP,  TX  '*i:r 

1  Dr.  **alcolm  Re. 

*  FHFL/V'P 

Rrooks  AFr,  TX  'c 

2  TCHTW/^TCH  C»cp  72 
?*-•'  ppard  AFE,  TX  ~f  7i  ’ 

1  Dr.  Jo*-  Ward  ,  Jr. 

AFHRL/HP»‘D 
°rcoks  AFP,  TX 

M*>rines 


1  H .  ‘./ill  ian  Gre*  nup 

Education  Advisor  (Fr;i) 

Education  Center,  MCTEC 
QurnMoo.  VA  221  “4 

1  Director,  Office  of  Manpower  Utilization 
HC.  Mr r i r. '»  Corps  (MPU) 

PCP,  PIdg.  2Cnc 
Quantico,  VA  pp 1;4 

1  Headquarters ,  U.  S.  ferine  Corps 
Code  MPI-2*‘% 

Washington,  DC  2f?Q0 

y  Special  Assistant  for  Marine 
Corps  Matters 
Code  10f.M 

Df f ice  of  Naval  Research 
o^C  M.  Cuincy  3t . 

Arlington,  VA  2 7217 

y  Major  Michael  L.  Patrow,  IJSMC 
Headquarters,  Marine  Corps 
(Cede  MPI-2P) 

Washington,  DC  2f‘,P'' 

1  DR.  A.L.  SLAFKOSKY 

SCTENTTFIC  ADVISER  (CODE  RD-l ) 

HO,  U.S.  MARINE  CCRPS 
WASHTNCTON,  DC  PP?£0 

CoastGuard 


1  Chief,  Psychol ogicrl  Re sere h  Branch 
U.  f.  Const  Guard  (G-P-1 /2/TP42 ) 
Washington,  DC  2059° 

1  Mr .  Thomas  A.  Warm 

U.  S.  Cocst  Guard  Institute 
P.  C.  Substation  IP 
Oklahoma  City.  OK  *'<1^9 


Other  DoD 


Dcfens*3  Technical  'nformetion  Center 
Camsrcn  Station,  Bldg  5 
Alexandria,  VA  22^14 
Attn:  TC 

Dr  .  Wl  1 1  i;n  Gr-'t'an 
Testing  Director. *tn 
r'rPcnw/MFphT-p 

.  cherid?n,  TL  fPP"'' 


*  Mi-,  ctor,  Pcs";rr*  n-*  D- • 

"ir*»'<MpA,-L : 

T'1’  ,  7'  ••  per •.J.jf.r 
h*  r.t  in>'*  o”  ,  rr  p-  ■  -  • 

1  v  :  ’  :  r  y  **'  f  r-  '  -  r. .  r  g  ■  *■  . 

r.  r:onr.'  '  "'--r  • .  1  r  *•  > 
rff’Cr  cf  *  he  Cn  .  r  **  •  r.  ry  r.  f  T  ■■  f  - 
for  P -s*3.  rck  *■  Fn.‘. .  n*  ■  i  i  r.g 
Poom  '01  •*■' ,  Tna  P»  r  rc  r 
W  r»* .  r^tor. ,  TC  7'  '  "  1 

l  Dr  .  '  3y:.*  "•  '  ’riin 

Of  f  i  '  cf  tne  Assist."*  Kf  .  r  •  *  »-y 
of  D  * f-  r  S'--  ( *-'P A  ^  L  > 

2P2f?  T--  P'-p*  •  »cn 

1‘dsx  i  ngtQM ,  rc  I r  iry 

1  ?ADPA 

1 4''#'  j  l  ror.  PI  vd . 

"ri’ngton,  V f 


Mr ,  Rict  -.rd  f'cKi  1 1  ip 
Ptrsorn'1  FAD  ronter 
Office  of  Personnel  Mar  gen'nv 
lCnr  E  rTeet  mw 
Washington,  DC  PDCig 

Dr.  Andrew  P.  Molnar 
Science-  Education  Dev. 
and  P- search 

National  Science  Foundation 
Washington,  DC  m55r 

1  Dr.  H.  Wallace  Sinfiko 
Program  rirrctor 

Manpower  Research  and  Advisory  Service 
Cnithsonirn  Institution 
PCI  North  Pitt  Street 
Alexandria ,  VP  22; 14 

1  Dr.  Vern  Ihry 

Personnel  PAD  'nter 
Cfficc  of  Persvnnel  Management 
19DC  E  Street  W 
Washington,  DC  2^415 

1  Dr.  Joseph  L.  Young,  Director 
Memory  4  Cognitive  Processes 
National  Science  Foundation 
Washington.  DC  2C5S i 

No”  Govt 


1  Dr.  James  Plgine 

University  of  Florida 
Gainesville,  FL  32M1 

i  Dr.  Erllng  B.  Andersen 
Department  of  Statistics 
Stud  lest raede  * 

1455  Copenhagen 
DENMARK 

t  Dr.  John  Pnnett 

Department  of  Psychology 
University  of  Warwick 
Coventry  CV11  7AL 
FNCLAHD 

1  1  psychological  research  unit 

Dept,  of  Defense  (Army  Office) 
Campbel1  Park  Offices 
Canberra  ACT  2fC",  Australia 


Dr .  Is-  rc  PcJ;ir 
FduenMonM  Testing  f.-rvice 
Princeton,  MJ  05**5° 


r  ip»  .  .1.  J*  on  IW  anger 

T»-  >ir  i  n/»  Development  Division 

C  n.-tdier  Forres  Training  rystrn 

CFT’HC.  CFP  Trent  on 

Ast  r  • ,  Ontario  KOk  WO 

Dr.  Mt-nuol Rlrrnbfum 

School  of  Education 

To l  Aviv  University 

T»» l  Aviv,  Pimat  Aviv  6°97P 

1  sr  ->e) 

Or.  Werner  Birke 

0'*zWPs  im  rtrejtkri'eft.pnmt 

Post f sob  20  50  0? 

rj_r  ■'on  nonn  2 

MFP-T  GERMANY 

Dr.  R.  Dnrrel  Dock 
Department  of  Education 
University  of  Chicago 
Chicago,  TL  6(16*7 

Liaison  Scientists 
Office  of  Naval  Research, 

Br.nch  Office  ,  London 
Pox  79  FPO  New  York  09910 

Dr .  Robert  Brennrn 

American  College  Testing  Programs 

p.  0.  Box  168 

low?  City.  IA  522UP 

DR.  JOHN  F.  BROCK 

Honeywell  Systems  A  Research  Center 
(MM  1 7 -?  3 1 P ) 

2600  Ridgeway  parkway 
Minneapolis,  MN  SS*I1? 

DR.  C.  VICTOR  BUNDERSON 
WTCAT  INC. 

UNIVERSITY  PLAZA.  SUITE  10 
1160  SO.  STATE  ST. 

OREM,  UT  8*057 

Dr.  'ohn  P.  Carroll 
Psychometric  Lab 
Umv .  of  No.  Carol  ine 
Davie  Hall  01.  A 
Chapel  Hill,  HO  2781*4 

C.. erics  Myers  Library 
Livingstone  House 
Livingstone  Road 
Stratford 
London  E15  2LJ 
ENGLA  HD 

Dr .  Norman  Cl  iff 
Dept,  of  Psychology 
tjniv.  of  So.  California 
University  P.-rk 
Los  Angflcs,  CA  9OOO7 

Dr.  William  E.  Coffman 
Director,  Iowa  Testing  P-ograms 
“**«  Lindquist  Center 
University  of  Town 
Tow"*  city,  TA  522*1? 

Dr.  Meredith  P.  Crawford 
Amcri'an  Psychological  Association 
ipnr  17th  root. ,  N.W. 

Washington,  DC  PW’f 


1  Dr.  H->ns  frombaj*  1 

Education  Research  Center 
University  of  Leydrr. 

Rorrh-'.avrl .  an  2 
??VI  EN  L'*ydm 

The  NFTHFRLAHDT.  1 

1  Dr.,Fri»7  Dr; sgow 

Yale  School  of  °rp; nl zr* ion  and  Man.igen* 
Yale  University 
nox  1  a 

Few  Haven,  CT  0fr?f 

1  Mike  Durmeycr 

instruct  ion.'.l  Program  Development 
Building  90 
NET-PDCD 

Great  l.a k"S  NTC,  TL  60(10* 

1  ERIC  Facility-Acquisitions 
*877  Rugby  Avenue 
Bethesdi.,  MD  2001** 

1  Dr.  Benjamin  A.  Fairbank,  Jr. 

McFann-Gray  l  Associates,  Tnc. 

5*25  Callaghan 
Suite  ??5 

San  Antonio,  Texr* 

1  Or.  Leonard  Faldt 

Lindquist  Center  for  Meiisurmcnt 
University  of  Iowa 
Towa  City,  IA  5 22*2 

1  Dr.  Richard  L.  Ferguson 

The  Americrn  College  Testing  Prograr 
P.0.  Box  166 
Iowa  City,  IA  522*40 

1  Dr.  Victor  Fields 
Dept,  of  Psychology 
Montgomery  College 
Rockville,  MD  20850 

1  Univ.  Prof.  Dr.  Gerhard  Fischer 
Liebiggnss*  5/3 
A  1010  Vienna 
AUSTRIA 

1  Professor  Donald  Fitzgerald 
University  of  New  England 
Armldalc,  New  South  Wales  2351 
AUSTRALIA 

1  DR.  ROBERT  GLASEP 
LRDC 

UNIVERSTJY  OF  PITTSBURGH 
3979  o':j;ra  street 
PITTSBURGH,  PA  15213 

1  Dr.  Daniel  Gopher 

Industrial  4  Management  Engineering 
Technion-Isrsel  Tnstitute  of  Technology 
Haifa 
ISRAEL 

1  Dr.  Bert  Green  1 

Johns  Hopkins  University 
Department  of  Psychology 
Charles  4  3*th  Street 
Baltimore,  MD  21218 

1  Dr,  Ron  Hamblet.on 

School  of  Education  1 

University  of  Kassechusetts 
Amherst,  NA  "100? 

1  Dr.  Delwyn  Harnisch 

University  of  Illinois  1 

2*2b  Education 
Urbane ,  TL  61F01 


Dr.  Ch  fit  or  II  rrls 
School  of  Fdu:.  1  ion 
Uni  versify  of  Cliforn.. 
r.  -nt;.  I  ;  rb;r..,  '  A  1°r 

Dr.  Dustin  I1.  Huston 
Wient  ,  *nc . 

Pox  9 f( 

Orrm.  IIT  c*40r.'' 

Dr.  Lloyd  Humphreys 
Department  of  Psycho  1  of y 
University  of  *11  Inc is 
O-  imp.'ign,  1L  rlr?° 

Dr.  Sir  yen  Hunk. 

D"p:  r*.m«  nt  of  Fduc;tton 
University  of  Albert.- 
Edmonton,  Albt**t; 

CANADA 

Dr .  Far!  Hunt 
Dept,  of  Psychology 
University  of  "ashington 
f.'MMp,  b’A  9°  1nC 

1  Dr.  Jack  Hunter 
2122  Cool idge  St . 

Lansing,  MI  *48906 

1  Dr.  Huynh  Huynh 

College  of  Education 
University  of  South  Carolina 
Columbia,  SC  29208 

1  Professor  John  A.  Keat* 

University  of  Newcastle 
AUSTRALIA  2?0° 

1  Mr.  Jeff  Kelety 

Department  of  instructional  Technology 
University  of  Southern  C-lifornir 
Los  Angeles,  CA.  92007 

1  Dr.  Stephen  Kosslyn 
Harvard  University 
Department,  of  Psychology 
Kirkland  Street 
Cambridge,  MA  02178 

1  Dr.  Marcy  Lansmr.n 

Department  of  Psychology,  NI  25 
University  of  Washington 
Seattle,  WA  98195 

1  Dr.  Alan  L^sgold 
Learning  R&D  Center 
University  of  Pittsburgh 
Pittsburgh,  PA  15260 

1  Dr.  Michael  Levine 

Department  of  Educational  Psychology 
210  Education  Bldg. 

University  of  Illinois 
Champaign,  TL  61801 

Dr.  Chrrles  Lewis 
Faculteit.  Social  e  Wetenschappen 
Ri Jksuniversiteit  Groningen 
Oude  Bot.eringestraet  2? 

Q7i ?GC  Groningen 
Netherl ands 

Dr.  Robert  Linn 
College  of  Education 
University  of  Illinois 
Urbana ,  IL  61801 

Dr.  Frederick  K.  Lord 
Educational  Testing  Service 
Princeton,  NJ  0°5*I0 


1 


[V  .  !  n«  Lun.^J*'! 

f'  -p.  of  Ps>'  Ni'odjf 

.ir  v«j  of  •*  S*  n  r  *’  '* 

r,  *.!l  ,.n  J3  w.  /*  . 

f--o-pALM 

Or  .  ry  * 

F  1 1.  .  *  ?  o''-  ’  Ti* st  ir  \  1  •'*  * 

Pr  in-  .-tor  ,  NJ  ''***'  * 


C  *»'*«'  t  v** 

*V'**.»i  f 


nr  .  ftob^rt  Tsui  .kiw< 
rvp  r »».  n*  of  '**  •  »  1  sr  1c  s 

t  ’  t>  r  «  *  *  y  of  *'l  3 so i  r  j 
Co'  unbj ;  .  ••T  'r  rr-’ 

Or  o  i  y  j  ^  V  J . 

A  3  S<  s  sri  ■  n  t  *  y  *  •  •  m  s  ro  •  pr,  r  *  .  on 
t  '**  1 !  n  ;  v  *  *■  '.  J  ’  y  /  v  **  r  u* 

f  » 

*“■  .  P  tjl  ,  MK  rr,*  i« 


Pt  p  .rtnon*  of  Psy-'ho^ogy 
University  of  Hoiis’on 
Houstcr.  Ty 

i  Of  ^  **4*111’'  T.  ’’^yo 

l.oyol  university  of  CMo-go 

up-'  Nor’.o  t'u  M*-*n  Sv''n,,r* 

O-i-v.c'.  ’L  'CM* 

1  r>r  .  Fr  ;  k  M^Wi  M  i-wr 

Fducr  *  ion  D>  v  .  oni  Pcs*  *r~\ 
N.i*  *on:l  .'r i >  ncr  Found  'lion 
>  s  h ;  ng  t  on  ,  X  PPC.r  r 

i  Prof-'ssor  Jason  Ml  1 'man 
Department.  of  Education 
T*  on*'  MhII 
fornv-M  University 
Tthnr*.,  MY  1««»r  ’ 

i  Pr.  Melvin  R .  Novlck 

Lindquist  Center  for  Mersurmnn* 
University  of  Tow.i 
•ow*  City,  TA  E??<? 

t  Or.  Jess"  f'r Innsky 

Institute  for  Defense  Analyses 
UCP  Arny  Navy  Drive 
Arlington,  VA  2??n2 

1  Woyne  M.  Potienre 

American  Council  on  Education 
GED  Testing  fervice,  Suite  ?0 
One  Dupont  Clrle,  NW 
Washington,  DC  ?00*?6 

1  Or.  Jrmcs  A.  Paulson 

Portland  fteto  University 
P.O.  Box  751 
Portland,  OR  97?07 

1  MR.  LUIGI  PETRULLO 

?U?1  N .  EDGEWOOD  STREET 
ARLINGTON,  VA  22W* 

1  Dr.  Steven  E.  Poltrock 
Department  of  Psychology 
University  of  Denver 
Denver, CC  °n20R 


’  *04*  P  l  '  *  *  ' 

>  J  .  Py  n 

!*  p,  •  ‘(i«  r*  of  It  uf~  •  *  l  *>* 
nnu>f  sfy  .-f  '■-•’O' 

rO  -  tint  1  .  .  *C  "  r° 

pRi'F  FlIHIK"  SAMFJ'MA 

r,£pT  :->f  p^rrmi  'ir.T 
IIN  ’  VFPk'TY  OF  TFNHFTFF 
KN0FM1LLE.  TN  '>?'"» 

i  Fr.nk  L.  "chmidt 

Dep.  rtmen*  of  Psychology 

r’cg.  CG 

C» orge  Washington  University 
Washington,  DC  200'? 

1  Dr.  Krsuo  Shigemasu 
Universl'y  of  Tohoku 

Deportment  of  Educational  Psycliology 
Kawauchl  .  Sendai  ISO 

JAPAN 

1  Dr.  Edwin  Shirkey 

Department  of  Psychology 

tin i vnr sit y  of  Centr.-l  Florida 

Orlando,  FL  ’?P1  f 

1  Dr.  Richard  Snow 
School  of  Education 
Stanford  University 
Stanford,  C A  9“’05 


Dr.  Thomas  G.  Sticht 
Director ,  Baste  Skills  Division 
HUMRRO 

’00  N.  Washington  Street 
Alexandria ,VA 

DR.  PATRICK  SUPPES 

INSTITUTE  FOR  MATHEMATICAL  STUDIES  IN 
THE  SOCIAL  SCIENCES 
STANFORD  UNIVERSITY 
STANFORD,  CA 

Dr.  Harlharan  Swaminathan 
Laboratory  of  Psychometric  and 
Evaluation  Research 
School  of  Education 
University  of  Massachusetts 
Amherst,  MA  0100? 

1  Dr.  Brad  Sympson 

Psychometric  Research  Croup 
Fducat local  Testing  Srrvicr 
Princeton,  NJ  0SGN1 

Dr.  Klkuml  Tat.suoka 
Computer  Pased  Education  Research 
Laboratory 

PS?  Engineering  Reseirch  Laboratory 
University  of  Illinois 
Urbane,  IL  S1P01 

1  Dr.  David  Thlssen 

Department  of  Psychology 
University  of  Kansas 
Lawrcnrc,  KS  SfOUR 


'  r  .  Uqi*  r  W  l  nr  r 
"  viajiu-.  of  psycho’. op 
FUu<  ,  ’  irin  I  T*  r.»  in*;  rvl< 
''Mr  .  ton  ,  f'J  o'r„r> 

PR.  PUSAN  r.  WM’TFLY 
pc.Yci40Lxr  department 
Ufl *  VF Rf  TTY  OF  KA*’^Ar 
LAWRFNCF  .  KAN**  AT  ffO«»l 

Ool  fg-  ng  U1  '.cjpru^#’ 

T*  r»- 1  tkr;  •  f t 
Po«  ?r  ^  o* 

D-r  nc  Bonn  ? 

UFf T  GFRMAPY 


Previous  Publications  (continued) 


77-6.  An  Adaptive  Testing  Strategy  for  Achievement  Test  Batteries.  October 
1977. 

77-5.  Calibration  of  an  Item  Pool  for  the  Adaptive  Measurement  of  Achievement. 
September  1977. 

77-4.  A  Rapid  Item-Search  Procedure  for  Bayesian  Adaptive  Testing.  May  1977. 
77-3.  Accuracy  of  Perceived  Test-Item  Difficulties.  May  1977. 

77-2.  A  Comparison  of  Information  Functions  of  Multiple-Choice  and  Free- 
Response  Vocabulary  Items.  April  1977. 

77-1.  Applications  of  Computerized  Adaptive  Testing.  March  1977. 

Final  Report:  Computerized  Ability  Testing,  1972-1975.  April  1976. 

76-5.  Effects  of  Item  Characteristics  on  Test  Fairness.  December  1976. 

76-4.  Psychological  Effects  of  Immediate  Knowledge  of  Results  and  Adaptive 
Ability  Testing.  June  1976. 

76-3.  Effects  of  Immediate  Knowledge  of  Results  and  Adaptive  Testing  on  Ability 
Test  Performance.  June  1976. 

76-2.  Effects  of  Time  Limits  on  Test-Taking  Behavior.  April  1976. 

76-1.  Some  Properties  of  a  Bayesian  Adaptive  Ability  Testing  Strategy.  March 
1976. 

75-6.  A  Simulation  Study  of  Stradaptive  Ability  Testing.  December  1975. 

75-5.  Computerized  Adaptive  Trait  Measurement:  Problems  and  Prospects. 

November  1975. 

75-4.  A  Study  of  Computer-Administered  Stradaptive  Ability  Testing.  October 
1975. 

75-3.  Empirical  and  Simulation  Studies  of  Flexilevel  Ability  Testing.  July 
1975. 

75-2.  TETREST:  A  FORTRAN  IV  Program  for  Calculating  Tetrachoric  Correlations. 
March  1975. 

75-1.  An  Empirical  Comparison  of  Two-Stage  and  Pyramidal  Adaptive  Ability 
Testing.  February  1975. 

74-5.  Strategies  of  Adaptive  Ability  Measurement.  December  1974. 

74-4.  Simulation  Studies  of  Two-Stage  Ability  Testing.  October  1974. 

74-3.  An  Empirical  Investigation  of  Computer-Administered  Pyramidal  Ability 
Testing.  July  1974. 

74-2.  A  Word  Knowledge  Item  Pool  for  Adaptive  Ability  Measurement.  June  1974. 
74-1.  A  Computer  Software  System  for  Adaptive  Ability  Measurement.  January 
1974. 

73-4.  An  Empirical  Study  of  Computer-Administered  Two-Stage  Ability  Testing. 
October  1973. 

73-3.  The  Stratified  Adaptive  Computerized  Ability  Test.  September  1973. 

73-2.  Comparison  of  Four  Empirical  Item  Scoring  Procedures.  August  1973. 

73-1.  Ability  Measurement:  Conventional  or  Adaptive?  February  1973. 

Copies  of  these  reports  are  available,  while  supplies  last,  from: 
Computerized  Adaptive  Testing  Laboratory 
N660  Elliott  Hall 
University  of  Minnesota 
75  East  River  Road 
Minneapolis  MN  55455  U.S.A. 


Previous  Publications 


81-4 

81-3 

81-2 

81-1 

80-5 

80-4 

80-3 

80-2 

80-1 

79-7 

79-6 

79-5 

79-4 

79-3 

79-2 

79-1 

78-5 

78-4 

78-3 

78-2 

78-1 

77-7 


Proceedings  of  the  1977  Computerized  Adaptive  Testing  Conference. 
July  1978. 


Research  Reports 

Factors  Influencing  the  Psychometric  Characteristics  of  an  Adaptive 
Testing  Strategy  for  Test  Batteries.  November  1981. 

A  Validity  Comparison  of  Adaptive  and  Conventional  Strategies  for  Mastery 
Testing.  September  1981. 

Final  Report:  Computerized  Adaptive  Ability  Testing.  April  1981. 

Effects  of  Immediate  Feedback  and  Pacing  of  Item  Presentation  on  Ability 
Test  Performance  and  Psychological  Reactions  to  Testing.  February 
1981. 

Review  of  Test  Theory  and  Methods.  January  1981. 

An  Alternate-Forms  Reliability  and  Concurrent  Validity  Comparison  of 
Bayesian  Adaptive  and  Conventional  Ability  Tests.  December  1980. 

A  Comparison  of  Adaptive,  Sequential,  and  Conventional  Testing  Strategies 
for  Mastery  Decisions.  November  1980. 

Criterion-Related  Validity  of  Adaptive  Testing  Strategies.  June  1980. 

Interactive  Computer  Administration  of  a  Spatial  Reasoning  Test.  April 
1980. 

Final  Report:  Computerized  Adaptive  Performance  Evaluation.  February 
1980. 

Effects  of  Immediate  Knowledge  of  Results  on  Achievement  Test  Performance 
and  Test  Dimensionality.  January  1980. 

The  Person  Response  Curve:  Fit  of  Individuals  to  Item  Characteristic 
Curve  Models.  December  1979. 

Efficiency  of  an  Adaptive  Inter-Subtest  Branching  Strategy  in  the 
Measurement  of  Classroom  Achievement.  November  1979. 

An  Adaptive  Testing  Strategy  for  Mastery  Decisions.  September  1979. 

Effect  of  Point-in-Time  in  Instruction  on  the  Measurement  of  Achievement. 
August  1979. 

Relationships  among  Achievement  Level  Estimates  from  Three  Item 
Characteristic  Curve  Scoring  Methods.  April  1979. 

Final  Report:  Bias-Free  Computerized  Testing.  March  1979. 

Effects  of  Computerized  Adaptive  Testing  on  Black  and  White  Students. 
March  1979. 

Computer  Programs  for  Scoring  Test  Data  with  Item  Characteristic  Curve 
Models.  February  1979. 

An  Item  Bias  Investigation  of  a  Standardized  Aptitude  Test.  December 
1978. 

A  Construct  Validation  of  Adaptive  Achievement  Testing.  November  1978. 

A  Comparison  of  Levels  and  Dimensions  of  Performance  in  Black  and  White 
Groups  on  Tests  of  Vocabulary,  Mathematics,  and  Spatial  Ability. 
October  1978. 

The  Effects  of  Knowledge  of  Results  and  Test  Difficulty  on  Ability  Test 
Performance  and  Psychological  Reactions  to  Testing.  September  1978. 

A  Comparison  of  the  Fairness  of  Adaptive  and  Conventional  Testing 
Strategies.  August  1978. 

An  Information  Comparison  of  Conventional  and  Adaptive  Tests  in  the 
Measurement  of  Classroom  Achievement.  October  1977. 


-continued  overleaf- 


